Matthias Gerstgrasser Papers

RLJ Journal 2025 Journal Article

Collaboration Promotes Group Resilience in Multi-Agent RL

Ilai Shraga
Guy Azran
Matthias Gerstgrasser
Ofir Abu
Jeffrey Rosenschein
Sarah Keren

To effectively operate in various dynamic scenarios, RL agents must be resilient to unexpected changes in their environment. Previous work on this form of resilience has focused on single-agent settings. In this work, we introduce and formalize a multi-agent variant of resilience, which we term group resilience. We further hypothesize that collaboration with other agents is key to achieving group resilience; collaborating agents adapt better to environmental perturbations in multi-agent reinforcement learning (MARL) settings. We test our hypothesis empirically by evaluating different collaboration protocols and examining their effect on group resilience. Our experiments show that all the examined collaborative approaches achieve higher group resilience than their non-collaborative counterparts.

PDF Details

RLC Conference 2025 Conference Paper

Collaboration Promotes Group Resilience in Multi-Agent RL

Ilai Shraga
Guy Azran
Matthias Gerstgrasser
Ofir Abu
Jeffrey Rosenschein
Sarah Keren

To effectively operate in various dynamic scenarios, RL agents must be resilient to unexpected changes in their environment. Previous work on this form of resilience has focused on single-agent settings. In this work, we introduce and formalize a multi-agent variant of resilience, which we term group resilience. We further hypothesize that collaboration with other agents is key to achieving group resilience; collaborating agents adapt better to environmental perturbations in multi-agent reinforcement learning (MARL) settings. We test our hypothesis empirically by evaluating different collaboration protocols and examining their effect on group resilience. Our experiments show that all the examined collaborative approaches achieve higher group resilience than their non-collaborative counterparts.

PDF Details

ICML Conference 2025 Conference Paper

Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World

Joshua Kazdan
Rylan Schaeffer
Apratim Dey
Matthias Gerstgrasser
Rafael Rafailov
David L. Donoho
Sanmi Koyejo

What happens when generative machine learning models are pretrained on web-scale datasets containing data generated by earlier models? Some prior work warns of “model collapse” as the web is overwhelmed by synthetic data; other work suggests the problem can be contained (i. e. collapse can be avoided) by managing how available data are used in pretraining. In this paper, we report experiments on three ways of using data (training-workflows), across three generative model task-settings (multivariate Gaussian estimation, kernel density estimation, and language-model fine-tuning) to further confirm the possibility of containment: (a) we confirm that the training-workflow of replacing all real data by successive generations of purely synthetic data indeed suffers model collapse in all task-settings studied; (b) we consider the training-workflow of accumulating synthetic data alongside real data and training on all data combined and confirming that, although the proportion of real data eventually becomes zero, models remain stable and their test losses do not diverge under this training-workflow; (c) we consider a training-workflow where real and synthetic data accumulate together but successive generations of pretraining are constrained to use fixed-size data subsets each generation. In this workflow, we observe slow and gradual rather than explosive degradation of test loss performance across generations. Our insights are particularly important when forecasting whether future frontier generative models will collapse or thrive, and our results open avenues for empirically and mathematically studying the context-dependent value of synthetic data.

Details

NeurIPS Conference 2025 Conference Paper

Position: Machine Learning Conferences Should Establish a "Refutations and Critiques" Track

Rylan Schaeffer
Joshua Kazdan
Yegor Denisov-Blanch
Brando Miranda
Matthias Gerstgrasser
Susan Zhang
Andreas Haupt
Isha Gupta

Science progresses by iteratively advancing and correcting humanity's understanding of the world. In machine learning (ML) research, rapid advancements have led to an explosion of publications, but have also led to misleading, incorrect, flawed or perhaps even fraudulent studies being accepted and sometimes highlighted at ML conferences due to the fallibility of peer review. While such mistakes are understandable, ML conferences do not offer robust processes to help the field systematically correct when such errors are made. This position paper argues that ML conferences should establish a dedicated "Refutations and Critiques" (R&C) Track. This R&C Track would provide a high-profile, reputable platform to support vital research that critically challenges prior research, thereby fostering a dynamic self-correcting research ecosystem. We discuss key considerations including track design, review principles, potential pitfalls, and provide an illustrative example submission concerning a recent ICLR 2025 Oral. We conclude that ML conferences should create official, reputable mechanisms to help ML research self-correct.

PDF Details

ICML Conference 2023 Conference Paper

Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

Matthias Gerstgrasser
David C. Parkes

Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework.

Details

NeurIPS Conference 2023 Conference Paper

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Matthias Gerstgrasser
Tom Danino
Sarah Keren

We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants.

PDF Details

AAMAS Conference 2023 Conference Paper

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Matthias Gerstgrasser
Tom Danino
Sarah Keren

We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation is available under https: //github. com/mgerstgrasser/super.

PDF

ICLR Conference 2022 Conference Paper

CrowdPlay: Crowdsourcing Human Demonstrations for Offline Learning

Matthias Gerstgrasser
Rakshit S. Trivedi
David C. Parkes

Crowdsourcing has been instrumental for driving AI advances that rely on large-scale data. At the same time, reinforcement learning has seen rapid progress through benchmark environments that strike a balance between tractability and real-world complexity, such as ALE and OpenAI Gym. In this paper, we aim to fill a gap at the intersection of these two: The use of crowdsourcing to generate large-scale human demonstration data in the support of advancing research into imitation learning and offline learning. To this end, we present CrowdPlay, a complete crowdsourcing pipeline for any standard RL environment including OpenAI Gym (made available under an open-source license); a large-scale publicly available crowdsourced dataset of human gameplay demonstrations in Atari 2600 games, including multimodal behavior and human-human and human-AI multiagent data; offline learning benchmarks with extensive human data evaluation; and a detailed study of incentives, including real-time feedback to drive high quality data. We hope that this will drive the improvement in design of algorithms that account for the complexity of human, behavioral data and thereby enable a step forward in direction of effective learning for real-world settings. Our code and dataset are available at https://mgerstgrasser.github.io/crowdplay/.

Details

AAAI Conference 2021 Conference Paper

Reinforcement Learning of Sequential Price Mechanisms

Gianluca Brero
Alon Eden
Matthias Gerstgrasser
David Parkes
Duncan Rheingans-Yoo

We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of sequential price mechanisms, which generalizes both serial dictatorship and posted price mechanisms and essentially characterizes all strongly obviously strategyproof mechanisms. Learning an optimal mechanism within this class forms a partiallyobservable Markov decision process. We provide rigorous conditions for when this class of mechanisms is more powerful than simpler static mechanisms, for sufficiency or insufficiency of observation statistics for learning, and for the necessity of complex (deep) policies. We show that our approach can learn optimal or near-optimal mechanisms in several experimental settings.

PDF Details

AAAI Conference 2019 Conference Paper

Multi-Unit Bilateral Trade

Matthias Gerstgrasser
Paul W. Goldberg
Bart De Keijzer
Philip Lazos
Alexander Skopalik

We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to the buyer and how much money is transferred from the buyer to the seller. We consider two classes of valuation functions for the buyer and seller: Valuations that are increasing in the number of units in possession, and the more specific class of valuations that are increasing and submodular. Furthermore, we present some approximation results about the performance of certain such mechanisms, in terms of social welfare: For increasing submodular valuation functions, we show the existence of a deterministic 2-approximation mechanism and a randomised e/(1−e) approximation mechanism, matching the best known bounds for the single-item setting.

PDF Details

AAMAS Conference 2018 Conference Paper

On the Complexity of Optimal Correlated Auctions and Reverse Auctions

Matthias Gerstgrasser

We investigate the problem of finding a revenue-optimal auction with correlated bidders. We give an algorithm for the exact solution for two bidders, and for a 5 3 -approximation for many bidders, improving from O(n6) runtime to O(n3) for both problems by exploiting structural properties of this problem directly. We show that for correlated bidders, reverse auctions behave differently from auctions. For two bidders we discuss a constant-factor reduction in complexity. For k ≥ 3 bidders, we show that the optimal reverse auction must sometimes buy k copies of the item.

PDF

Possible papers