Arrow Research search
Back to AAAI

AAAI 2026

Learning in Zero-Sum Markov Games: Relaxing Strong Reachability and Mixing Time Assumptions

Conference Paper AAAI Technical Track on Game Theory and Economic Paradigms Artificial Intelligence

Abstract

We address payoff-based decentralized learning in infinite-horizon zero-sum Markov games. In this setting, each player makes decisions based solely on received rewards, without observing the opponent's strategy or actions, nor sharing information. Prior works established polynomial-time convergence to an approximate Nash equilibrium under strong reachability and mixing time assumptions. We propose a convergent algorithm that significantly relaxes these assumptions, requiring only the existence of a single policy with bounded reachability and mixing time. Our key algorithmic novelty is introducing Tsallis entropy regularization to smooth the best-response policy updates. By suitably tuning this regularization, we ensure sufficient exploration, thus bypassing previous stringent assumptions on the MDP. We prove a polynomial-time convergence to an approximate Nash equilibrium by establishing novel properties of the value and policy updates induced by the Tsallis entropy regularizer.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
AAAI Conference on Artificial Intelligence
Archive span
1980-2026
Indexed papers
28718
Paper id
563186886358581617