Transformer-Based Multi-Agent Reinforcement Learning Method With Credit-Oriented Strategy Differentiation

Kaixuan Huang; Bo Jin 0001; Kun Zhang; Haiyin Piao; Ziqi Wei 0001

Back to IROS

IROS 2025

Transformer-Based Multi-Agent Reinforcement Learning Method With Credit-Oriented Strategy Differentiation

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Details

Abstract

The problem of Multi-Agent Reinforcement Learning (MARL) shows a high level of both complexity in the environment and coordination between agents. In order to scale the algorithm to large-scale agent scenarios, neural networks designed for MARL are typically implemented with parameter sharing. These characteristics result in the challenges of partial observability, credit assignment and strategy homogenization. In this paper, a Transformer-Based Multi-Agent Reinforcement Learning Method With Credit-Oriented Strategy Differentiation (TMRC) is presented to address each of these challenges. First, we design a Temporal-Spatial Encoding module and an Attention-Based Value Decomposition module based on the Transformer architecture. The former leverages both temporal and spatial observation information, compensating for the missing environmental perspectives due to partial observability. The latter is designed to identify each agent’s individual contribution in complex interactions, effectively optimizing the credit assignment process. Then, we propose a Credit-Oriented Strategy Differentiation module that differentiates the entity representations of each agent based on their current task differences, allowing agents to have distinct real-time strategies, effectively mitigating the issue of strategy homogenization. We evaluate the proposed method on the SMAC benchmark. It demonstrates better final performance, faster convergence, and greater stability compared to other comparative methods. Additionally, a series of experiments are conducted to validate the effectiveness of the proposed modules. Our code is available at https://github.com/Hkxuan/TMRC.git.

Authors

Keywords

Reinforcement learning
Benchmark testing
Transformer cores
Transformers
Encoding
Real-time systems
Graph neural networks
Observability
Intelligent robots
Convergence
Multi-agent Reinforcement Learning
Transformer-based Methods
Multi-agent Reinforcement Learning Method
Neural Network
Complex Interactions
Spatial Information
Complex Environment
Temporal Information
Final Performance
Partial Observation
Partial Credit
Representation Of Entities
Credit Assignment
Convergence Rate
Partial Differential
Recurrent Neural Network
Global Status
Attention Mechanism
Kullback-Leibler
Self-supervised Learning
Gated Recurrent Unit
Historical Observations
Multi-agent Systems
Unique Perspective
Transformer Encoder
Ground Targets
Greatest Decline
Global Representation

Context

Venue: IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span: 1988-2025
Indexed papers: 26578
Paper id: 655350805499683270