Arrow Research search
Back to ICML

ICML 2025

Agent-Centric Actor-Critic for Asynchronous Multi-Agent Reinforcement Learning

Conference Paper Accept (poster) Artificial Intelligence · Machine Learning

Abstract

Multi-Agent Reinforcement Learning (MARL) struggles with coordination in sparse reward environments. Macro-actions —sequences of actions executed as single decisions— facilitate long-term planning but introduce asynchrony, complicating Centralized Training with Decentralized Execution (CTDE). Existing CTDE methods use padding to handle asynchrony, risking misaligned asynchronous experiences and spurious correlations. We propose the Agent-Centric Actor-Critic (ACAC) algorithm to manage asynchrony without padding. ACAC uses agent-centric encoders for independent trajectory processing, with an attention-based aggregation module integrating these histories into a centralized critic for improved temporal abstractions. The proposed structure is trained via a PPO-based algorithm with a modified Generalized Advantage Estimation for asynchronous environments. Experiments show ACAC accelerates convergence and enhances performance over baselines in complex MARL tasks.

Authors

Keywords

  • Multi-Agent Reinforcement Learning
  • Asynchronous Multi-Agent Reinforcement Learning
  • MacDec-POMDP

Context

Venue
International Conference on Machine Learning
Archive span
1993-2025
Indexed papers
16471
Paper id
717752269396758942