Author name cluster

Shuncheng He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

ICML Conference 2023 Conference Paper

Complementary Attention for Multi-Agent Reinforcement Learning

Jianzhun Shao
Hongchang Zhang
Yun Qu 0002
Chang Liu 0030
Shuncheng He
Yuhang Jiang 0001
Xiangyang Ji

In cooperative multi-agent reinforcement learning, centralized training with decentralized execution (CTDE) shows great promise for a trade-off between independent Q-learning and joint action learning. However, vanilla CTDE methods assumed a fixed number of agents could hardly adapt to real-world scenarios where dynamic team compositions typically suffer from dramatically variant partial observability. Specifically, agents with extensive sight ranges are prone to be affected by trivial environmental substrates, dubbed the "distracted attention" issue; ones with limited observation can hardly sense their teammates, degrading the cooperation quality. In this paper, we propose Complementary Attention for Multi-Agent reinforcement learning (CAMA), which applies a divide-and-conquer strategy on input entities accompanied with the complementary attention of enhancement and replenishment. Concretely, to tackle the distracted attention issue, highly contributed entities’ attention is enhanced by the execution-related representation extracted via action prediction with an inverse model. For better out-of-sight-range cooperation, the lowly contributed ones are compressed to brief messages with a conditional mutual information estimator. Our CAMA facilitates stable and sustainable teamwork, which is justified by the impressive results reported on the challenging StarCraftII, MPE, and Traffic Junction benchmarks.

Details

AAAI Conference 2023 Conference Paper

DARL: Distance-Aware Uncertainty Estimation for Offline Reinforcement Learning

Hongchang Zhang
Jianzhun Shao
Shuncheng He
Yuhang Jiang
Xiangyang Ji

To facilitate offline reinforcement learning, uncertainty estimation is commonly used to detect out-of-distribution data. By inspecting, we show that current explicit uncertainty estimators such as Monte Carlo Dropout and model ensemble are not competent to provide trustworthy uncertainty estimation in offline reinforcement learning. Accordingly, we propose a non-parametric distance-aware uncertainty estimator which is sensitive to the change in the input space for offline reinforcement learning. Based on our new estimator, adaptive truncated quantile critics are proposed to underestimate the out-of-distribution samples. We show that the proposed distance-aware uncertainty estimator is able to offer better uncertainty estimation compared to previous methods. Experimental results demonstrate that our proposed DARL method is competitive to the state-of-the-art methods in offline evaluation tasks.

PDF Details DOI

ICLR Conference 2023 Conference Paper

In-sample Actor Critic for Offline Reinforcement Learning

Hongchang Zhang
Yixiu Mao
Boyuan Wang
Shuncheng He
Yi Xu 0008
Xiangyang Ji

Offline reinforcement learning suffers from out-of-distribution issue and extrapolation error. Most methods penalize the out-of-distribution state-action pairs or regularize the trained policy towards the behavior policy but cannot guarantee to get rid of extrapolation error. We propose In-sample Actor Critic (IAC) which utilizes sampling-importance resampling to execute in-sample policy evaluation. IAC only uses the target Q-values of the actions in the dataset to evaluate the trained policy, thus avoiding extrapolation error. The proposed method performs unbiased policy evaluation and has a lower variance than importance sampling in many cases. Empirical results show that IAC obtains competitive performance compared to the state-of-the-art methods on Gym-MuJoCo locomotion domains and much more challenging AntMaze domains.

Details

NeurIPS Conference 2022 Conference Paper

Self-Organized Group for Cooperative Multi-agent Reinforcement Learning

Jianzhun Shao
Zhiqiang Lou
Hongchang Zhang
Yuhang Jiang
Shuncheng He
Xiangyang Ji

Centralized training with decentralized execution (CTDE) has achieved great success in cooperative multi-agent reinforcement learning (MARL) in practical applications. However, CTDE-based methods typically suffer from poor zero-shot generalization ability with dynamic team composition and varying partial observability. To tackle these issues, we propose a spontaneously grouping mechanism, termed Self-Organized Group (SOG), which is featured with conductor election (CE) and message summary (MS). In CE, a certain number of conductors are elected every $T$ time-steps to temporally construct groups, each with conductor-follower consensus where the followers are constrained to only communicate with their conductor. In MS, each conductor summarize and distribute the received messages to all affiliate group members to hold a unified scheduling. SOG provides zero-shot generalization ability to the dynamic number of agents and the varying partial observability. Sufficient experiments on mainstream multi-agent benchmarks exhibit superiority of SOG.

PDF Details

NeurIPS Conference 2022 Conference Paper

SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Yuhang Jiang
Jianzhun Shao
Shuncheng He
Hongchang Zhang
Xiangyang Ji

Reinforcement learning typically relies heavily on a well-designed reward signal, which gets more challenging in cooperative multi-agent reinforcement learning. Alternatively, unsupervised reinforcement learning (URL) has delivered on its promise in the recent past to learn useful skills and explore the environment without external supervised signals. These approaches mainly aimed for the single agent to reach distinguishable states, insufficient for multi-agent systems due to that each agent interacts with not only the environment, but also the other agents. We propose Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning (SPD) to learn generic coordination policies for agents with no extrinsic reward. Specifically, we devise the Synergy Pattern Graph (SPG), a graph depicting the relationships of agents at each time step. Furthermore, we propose an episode-wise divergence measurement to approximate the discrepancy of synergy patterns. To overcome the challenge of sparse return, we decompose the discrepancy of synergy patterns to per-time-step pseudo-reward. Empirically, we show the capacity of SPD to acquire meaningful coordination policies, such as maintaining specific formations in Multi-Agent Particle Environment and pass-and-shoot in Google Research Football. Furthermore, we demonstrate that the same instructive pretrained policy's parameters can serve as a good initialization for a series of downstream tasks' policies, achieving higher data efficiency and outperforming state-of-the-art approaches in Google Research Football.

PDF Details

AAAI Conference 2022 Conference Paper

State Deviation Correction for Offline Reinforcement Learning

Hongchang Zhang
Jianzhun Shao
Yuhang Jiang
Shuncheng He
Guanwen Zhang
Xiangyang Ji

Offline reinforcement learning aims to maximize the expected cumulative rewards with a fixed collection of data. The basic principle of current offline reinforcement learning methods is to restrict the policy to the offline dataset action space. However, they ignore the case where the dataset’s trajectories fail to cover the state space completely. Especially, when the dataset’s size is limited, it is likely that the agent would encounter unseen states during test time. Prior policyconstrained methods are incapable of correcting the state deviation, and may lead the agent to its unexpected regions further. In this paper, we propose the state deviation correction (SDC) method to constrain the policy’s induced state distribution by penalizing the out-of-distribution states which might appear during the test period. We first perturb the states sampled from the logged dataset, then simulate noisy next states on the basis of a dynamics model and the policy. We then train the policy to minimize the distances between the noisy next states and the offline dataset. In this manner, we allow the trained policy to guide the agent to its familiar regions. Experimental results demonstrate that our proposed method is competitive with the state-of-the-art methods in a GridWorld setup, offline Mujoco control suite, and a modified offline Mujoco dataset with a finite number of valuable samples.

PDF Details

AAAI Conference 2022 Conference Paper

Wasserstein Unsupervised Reinforcement Learning

Shuncheng He
Yuhang Jiang
Hongchang Zhang
Jianzhun Shao
Xiangyang Ji

Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward. These pre-trained policies can accelerate learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning. Conventional approaches of unsupervised skill discovery feed a latent variable to the agent and shed its empowerment on agent’s behavior by mutual information (MI) maximization. However, the policies learned by MI-based methods cannot sufficiently explore the state space, despite they can be successfully identified from each other. Therefore we propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies. Additionally, we overcome difficulties in simultaneously training N(N > 2) policies, and amortizing the overall reward to each step. Experiments show policies learned by our approach outperform MI-based methods on the metric of Wasserstein distance while keeping high discriminability. Furthermore, the agents trained by WURL can sufficiently explore the state space in mazes and MuJoCo tasks and the pre-trained policies can be applied to downstream tasks by hierarchical learning.

PDF Details