AAMAS 2025
Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction
Abstract
Multi-agent reinforcement learning (MARL) enables systems of autonomous agents to solve complex tasks from jointly gathered experiences of the environment. Many MARL algorithms perform centralized training (CT), often in a simulated environment, where at each time-step the critic makes use of a single sample of the agents’ joint-action for training. Yet, as agents update their policies during training, these single samples may poorly represent the agents’ joint-policy leading to high variance gradient estimates that hinder learning. In this paper, we examine the effect on MARL estimators of allowing the number of joint-action samples taken at each time-step to be greater than 1 in training. Our theoretical analysis shows that even modestly increasing the number of jointaction samples shown to the critic leads to TD updates that closely approximate the true expected value under the current joint-policy. In particular, we prove this reduces variance in value estimates similar to that of decentralized training while maintaining the learning benefits of CT. We describe how such a protocol can be seamlessly realized by sharing policy parameters between the agents during training and apply the technique to induce lower variance in estimates in MARL methods within a general apparatus which we call Performance Enhancing Reinforcement Learning Apparatus (PERLA). Lastly, we demonstrate PERLA’s performance improvements and estimator variance reduction capabilities in a range of environments including Multi-agent Mujoco, and StarCraft II. ∗Work was conducted while at Huawei R&D. †Corresponding author. This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).
Authors
Keywords
Context
- Venue
- International Conference on Autonomous Agents and Multiagent Systems
- Archive span
- 2002-2025
- Indexed papers
- 7403
- Paper id
- 168619955721356454