Bingkun Bao Papers

NeurIPS Conference 2025 Conference Paper

In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning

Chao Li
Bingkun BAO
Yang Gao

In this paper, we consider fully decentralized cooperative multi-agent reinforcement learning, where each agent has access only to the states, its local actions, and the shared rewards. The absence of information about other agents' actions typically leads to the non-stationarity problem during per-agent value function updates, and the relative overgeneralization issue during value function estimation. However, existing works fail to address both issues simultaneously, as they lack the capability to model the agents' joint policy in a fully decentralized setting. To overcome this limitation, we propose a simple yet effective method named Return-Aware Context (RAC). RAC formalizes the dynamically changing task, as locally perceived by each agent, as a contextual Markov Decision Process (MDP), and addresses both non-stationarity and relative overgeneralization through return-aware context modeling. Specifically, the contextual MDP attributes the non-stationary local dynamics of each agent to switches between contexts, each corresponding to a distinct joint policy. Then, based on the assumption that the joint policy changes only between episodes, RAC distinguishes different joint policies by the training episodic return and constructs contexts using discretized episodic return values. Accordingly, RAC learns a context-based value function for each agent to address the non-stationarity issue during value function updates. For value function estimation, an individual optimistic marginal value is constructed to encourage the selection of optimal joint actions, thereby mitigating the relative overgeneralization problem. Experimentally, we evaluate RAC on various cooperative tasks (including matrix game, predator and prey, and SMAC), and its significant performance validates its effectiveness.

PDF Details

ICML Conference 2025 Conference Paper

Test-Time Selective Adaptation for Uni-Modal Distribution Shift in Multi-Modal Data

Mingcai Chen
Baoming Zhang
Zongbo Han
Wenyu Jiang
Yanmeng Wang
Shuai Feng
Yuntao Du 0001
Bingkun Bao

Modern machine learning applications are characterized by the increasing size of deep models and the growing diversity of data modalities. This trend underscores the importance of efficiently adapting pre-trained multi-modal models to the test distribution in real time, i. e. , multi-modal test-time adaptation. In practice, the magnitudes of multi-modal shifts vary because multiple data sources interact with the impact factor in diverse manners. In this research, we investigate the the under-explored practical scenario uni-modal distribution shift, where the distribution shift influences only one modality, leaving the others unchanged. Through theoretical and empirical analyses, we demonstrate that the presence of such shift impedes multi-modal fusion and leads to the negative transfer phenomenon in existing test-time adaptation techniques. To flexibly combat this unique shift, we propose a selective adaptation schema that incorporates multiple modality-specific adapters to accommodate potential shifts and a “router” module that determines which modality requires adaptation. Finally, we validate the effectiveness of our proposed method through extensive experimental evaluations. Code available at https: //github. com/chenmc1996/Uni-Modal-Distribution-Shift.

Details

Possible papers

In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning

Test-Time Selective Adaptation for Uni-Modal Distribution Shift in Multi-Modal Data