Arrow Research search
Back to IROS

IROS 2025

Transformer-Based Multi-Agent Reinforcement Learning Method With Credit-Oriented Strategy Differentiation

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Abstract

The problem of Multi-Agent Reinforcement Learning (MARL) shows a high level of both complexity in the environment and coordination between agents. In order to scale the algorithm to large-scale agent scenarios, neural networks designed for MARL are typically implemented with parameter sharing. These characteristics result in the challenges of partial observability, credit assignment and strategy homogenization. In this paper, a Transformer-Based Multi-Agent Reinforcement Learning Method With Credit-Oriented Strategy Differentiation (TMRC) is presented to address each of these challenges. First, we design a Temporal-Spatial Encoding module and an Attention-Based Value Decomposition module based on the Transformer architecture. The former leverages both temporal and spatial observation information, compensating for the missing environmental perspectives due to partial observability. The latter is designed to identify each agent’s individual contribution in complex interactions, effectively optimizing the credit assignment process. Then, we propose a Credit-Oriented Strategy Differentiation module that differentiates the entity representations of each agent based on their current task differences, allowing agents to have distinct real-time strategies, effectively mitigating the issue of strategy homogenization. We evaluate the proposed method on the SMAC benchmark. It demonstrates better final performance, faster convergence, and greater stability compared to other comparative methods. Additionally, a series of experiments are conducted to validate the effectiveness of the proposed modules. Our code is available at https://github.com/Hkxuan/TMRC.git.

Authors

Keywords

  • Reinforcement learning
  • Benchmark testing
  • Transformer cores
  • Transformers
  • Encoding
  • Real-time systems
  • Graph neural networks
  • Observability
  • Intelligent robots
  • Convergence
  • Multi-agent Reinforcement Learning
  • Transformer-based Methods
  • Multi-agent Reinforcement Learning Method
  • Neural Network
  • Complex Interactions
  • Spatial Information
  • Complex Environment
  • Temporal Information
  • Final Performance
  • Partial Observation
  • Partial Credit
  • Representation Of Entities
  • Credit Assignment
  • Convergence Rate
  • Partial Differential
  • Recurrent Neural Network
  • Global Status
  • Attention Mechanism
  • Kullback-Leibler
  • Self-supervised Learning
  • Gated Recurrent Unit
  • Historical Observations
  • Multi-agent Systems
  • Unique Perspective
  • Transformer Encoder
  • Ground Targets
  • Greatest Decline
  • Global Representation

Context

Venue
IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span
1988-2025
Indexed papers
26578
Paper id
655350805499683270