Arrow Research search

Author name cluster

Matthias Gerstgrasser

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

RLJ Journal 2025 Journal Article

Collaboration Promotes Group Resilience in Multi-Agent RL

  • Ilai Shraga
  • Guy Azran
  • Matthias Gerstgrasser
  • Ofir Abu
  • Jeffrey Rosenschein
  • Sarah Keren

To effectively operate in various dynamic scenarios, RL agents must be resilient to unexpected changes in their environment. Previous work on this form of resilience has focused on single-agent settings. In this work, we introduce and formalize a multi-agent variant of resilience, which we term group resilience. We further hypothesize that collaboration with other agents is key to achieving group resilience; collaborating agents adapt better to environmental perturbations in multi-agent reinforcement learning (MARL) settings. We test our hypothesis empirically by evaluating different collaboration protocols and examining their effect on group resilience. Our experiments show that all the examined collaborative approaches achieve higher group resilience than their non-collaborative counterparts.

RLC Conference 2025 Conference Paper

Collaboration Promotes Group Resilience in Multi-Agent RL

  • Ilai Shraga
  • Guy Azran
  • Matthias Gerstgrasser
  • Ofir Abu
  • Jeffrey Rosenschein
  • Sarah Keren

To effectively operate in various dynamic scenarios, RL agents must be resilient to unexpected changes in their environment. Previous work on this form of resilience has focused on single-agent settings. In this work, we introduce and formalize a multi-agent variant of resilience, which we term group resilience. We further hypothesize that collaboration with other agents is key to achieving group resilience; collaborating agents adapt better to environmental perturbations in multi-agent reinforcement learning (MARL) settings. We test our hypothesis empirically by evaluating different collaboration protocols and examining their effect on group resilience. Our experiments show that all the examined collaborative approaches achieve higher group resilience than their non-collaborative counterparts.

ICML Conference 2025 Conference Paper

Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World

  • Joshua Kazdan
  • Rylan Schaeffer
  • Apratim Dey
  • Matthias Gerstgrasser
  • Rafael Rafailov
  • David L. Donoho
  • Sanmi Koyejo

What happens when generative machine learning models are pretrained on web-scale datasets containing data generated by earlier models? Some prior work warns of “model collapse” as the web is overwhelmed by synthetic data; other work suggests the problem can be contained (i. e. collapse can be avoided) by managing how available data are used in pretraining. In this paper, we report experiments on three ways of using data (training-workflows), across three generative model task-settings (multivariate Gaussian estimation, kernel density estimation, and language-model fine-tuning) to further confirm the possibility of containment: (a) we confirm that the training-workflow of replacing all real data by successive generations of purely synthetic data indeed suffers model collapse in all task-settings studied; (b) we consider the training-workflow of accumulating synthetic data alongside real data and training on all data combined and confirming that, although the proportion of real data eventually becomes zero, models remain stable and their test losses do not diverge under this training-workflow; (c) we consider a training-workflow where real and synthetic data accumulate together but successive generations of pretraining are constrained to use fixed-size data subsets each generation. In this workflow, we observe slow and gradual rather than explosive degradation of test loss performance across generations. Our insights are particularly important when forecasting whether future frontier generative models will collapse or thrive, and our results open avenues for empirically and mathematically studying the context-dependent value of synthetic data.

NeurIPS Conference 2025 Conference Paper

Position: Machine Learning Conferences Should Establish a "Refutations and Critiques" Track

  • Rylan Schaeffer
  • Joshua Kazdan
  • Yegor Denisov-Blanch
  • Brando Miranda
  • Matthias Gerstgrasser
  • Susan Zhang
  • Andreas Haupt
  • Isha Gupta

Science progresses by iteratively advancing and correcting humanity's understanding of the world. In machine learning (ML) research, rapid advancements have led to an explosion of publications, but have also led to misleading, incorrect, flawed or perhaps even fraudulent studies being accepted and sometimes highlighted at ML conferences due to the fallibility of peer review. While such mistakes are understandable, ML conferences do not offer robust processes to help the field systematically correct when such errors are made. This position paper argues that ML conferences should establish a dedicated "Refutations and Critiques" (R&C) Track. This R&C Track would provide a high-profile, reputable platform to support vital research that critically challenges prior research, thereby fostering a dynamic self-correcting research ecosystem. We discuss key considerations including track design, review principles, potential pitfalls, and provide an illustrative example submission concerning a recent ICLR 2025 Oral. We conclude that ML conferences should create official, reputable mechanisms to help ML research self-correct.

ICML Conference 2023 Conference Paper

Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

  • Matthias Gerstgrasser
  • David C. Parkes

Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework.

NeurIPS Conference 2023 Conference Paper

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

  • Matthias Gerstgrasser
  • Tom Danino
  • Sarah Keren

We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants.

AAMAS Conference 2023 Conference Paper

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

  • Matthias Gerstgrasser
  • Tom Danino
  • Sarah Keren

We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation is available under https: //github. com/mgerstgrasser/super.

ICLR Conference 2022 Conference Paper

CrowdPlay: Crowdsourcing Human Demonstrations for Offline Learning

  • Matthias Gerstgrasser
  • Rakshit S. Trivedi
  • David C. Parkes

Crowdsourcing has been instrumental for driving AI advances that rely on large-scale data. At the same time, reinforcement learning has seen rapid progress through benchmark environments that strike a balance between tractability and real-world complexity, such as ALE and OpenAI Gym. In this paper, we aim to fill a gap at the intersection of these two: The use of crowdsourcing to generate large-scale human demonstration data in the support of advancing research into imitation learning and offline learning. To this end, we present CrowdPlay, a complete crowdsourcing pipeline for any standard RL environment including OpenAI Gym (made available under an open-source license); a large-scale publicly available crowdsourced dataset of human gameplay demonstrations in Atari 2600 games, including multimodal behavior and human-human and human-AI multiagent data; offline learning benchmarks with extensive human data evaluation; and a detailed study of incentives, including real-time feedback to drive high quality data. We hope that this will drive the improvement in design of algorithms that account for the complexity of human, behavioral data and thereby enable a step forward in direction of effective learning for real-world settings. Our code and dataset are available at https://mgerstgrasser.github.io/crowdplay/.

AAAI Conference 2021 Conference Paper

Reinforcement Learning of Sequential Price Mechanisms

  • Gianluca Brero
  • Alon Eden
  • Matthias Gerstgrasser
  • David Parkes
  • Duncan Rheingans-Yoo

We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of sequential price mechanisms, which generalizes both serial dictatorship and posted price mechanisms and essentially characterizes all strongly obviously strategyproof mechanisms. Learning an optimal mechanism within this class forms a partiallyobservable Markov decision process. We provide rigorous conditions for when this class of mechanisms is more powerful than simpler static mechanisms, for sufficiency or insufficiency of observation statistics for learning, and for the necessity of complex (deep) policies. We show that our approach can learn optimal or near-optimal mechanisms in several experimental settings.

AAAI Conference 2019 Conference Paper

Multi-Unit Bilateral Trade

  • Matthias Gerstgrasser
  • Paul W. Goldberg
  • Bart De Keijzer
  • Philip Lazos
  • Alexander Skopalik

We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to the buyer and how much money is transferred from the buyer to the seller. We consider two classes of valuation functions for the buyer and seller: Valuations that are increasing in the number of units in possession, and the more specific class of valuations that are increasing and submodular. Furthermore, we present some approximation results about the performance of certain such mechanisms, in terms of social welfare: For increasing submodular valuation functions, we show the existence of a deterministic 2-approximation mechanism and a randomised e/(1−e) approximation mechanism, matching the best known bounds for the single-item setting.

AAMAS Conference 2018 Conference Paper

On the Complexity of Optimal Correlated Auctions and Reverse Auctions

  • Matthias Gerstgrasser

We investigate the problem of finding a revenue-optimal auction with correlated bidders. We give an algorithm for the exact solution for two bidders, and for a 5 3 -approximation for many bidders, improving from O(n6) runtime to O(n3) for both problems by exploiting structural properties of this problem directly. We show that for correlated bidders, reverse auctions behave differently from auctions. For two bidders we discuss a constant-factor reduction in complexity. For k ≥ 3 bidders, we show that the optimal reverse auction must sometimes buy k copies of the item.