Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Till Freihaut; Luca Viano; Volkan Cevher; Matthieu Geist; Giorgia Ramponi

Back to EWRL

EWRL 2025

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Workshop Paper EWRL 2025 Poster Artificial Intelligence · Machine Learning · Reinforcement Learning

PDF

Abstract

This paper provides the first expert sample complexity characterization for learning a Nash equilibrium from expert data in Markov Games. We show that a new quantity named the \emph{single policy deviation concentrability coefficient} is unavoidable in the non-interactive imitation learning setting, and we provide an upper bound for behavioral cloning (BC) featuring such coefficient. BC exhibits substantial regret in games with high concentrability coefficient, leading us to utilize expert queries to develop and introduce two novel solution algorithms: MAIL-BRO and MURMAIL. The former employs a best response oracle and learns an $\varepsilon$-Nash equilibrium with $\mathcal{O}(\varepsilon^{-4})$ expert and oracle queries. The latter bypasses completely the best response oracle at the cost of a worse expert query complexity of order $\mathcal{O}(\varepsilon^{-8})$. Finally, we provide numerical evidence, confirming our theoretical findings.

Authors

Keywords

Imitation Learning
Multi-Agent Imitation Learning
Multi-Agent Reinforcement Learning

Context

Venue: European Workshop on Reinforcement Learning
Archive span: 2008-2025
Indexed papers: 649
Paper id: 88740585659479696