Arrow Research search

Author name cluster

Sheng Han

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
1 author row

Possible papers

6

AAAI Conference 2026 Conference Paper

Think How Your Teammates Think: Active Inference Can Benefit Decentralized Execution

  • Hao Wu
  • Shoucheng Song
  • Chang Yao
  • Sheng Han
  • Huaiyu Wan
  • Youfang Lin
  • Kai Lv

In multi-agent systems, explicit cognition of teammates' decision logic serves as a critical factor in facilitating coordination. Communication (i.e., "Tell") can assist in the cognitive development process by information dissemination, yet it is inevitably subject to real-world constraints such as noise, latency, and attacks. Therefore, building the understanding of teammates' decisions without communication remains challenging. To address this, we propose a novel non-communication MARL framework that realizes the construction of cognition through local observation-based modeling (i.e., "Think"). Our framework enables agents to model teammates' active inference process. At first, the proposed method produces three teammate portraits: perception-belief-action. Specifically, we model the teammate's decision process as follows: 1) Perception: observing environments; 2) Belief: forming beliefs; 3) Action: making decisions. Then, we selectively integrate the belief portrait into the decision process based on the accuracy and relevance of the perception portrait. This enables the selection of cooperative teammates and facilitates effective collaboration. Extensive experiments on the SMAC, SMACv2, MPE, and GRF benchmarks demonstrate the superior performance of our method.

AAAI Conference 2025 Conference Paper

CoDe: Communication Delay-Tolerant Multi-Agent Collaboration via Dual Alignment of Intent and Timeliness

  • Shoucheng Song
  • Youfang Lin
  • Sheng Han
  • Chang Yao
  • Hao Wu
  • Shuo Wang
  • Kai Lv

Communication has been widely employed to enhance multi-agent collaboration. Previous research has typically assumed delay-free communication, a strong assumption that is challenging to meet in practice. However, real-world agents suffer from channel delays, receiving messages sent at different time points, termed Asynchronous Communication, leading to cognitive biases and breakdowns in collaboration. This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration. To handle the above delays, this paper proposes a novel framework, Communication Delay-Tolerant Multi-Agent Collaboration (CoDe). At first, CoDe learns an intent representation as messages through future action inference, reflecting the stable future behavioral trends of the agents. Then, CoDe devises a dual alignment mechanism of intent and timeliness to strengthen the fusion process of asynchronous messages. In this way, agents can extract the long-term intent of others, even from delayed messages, and selectively utilize the most recent messages that are relevant to their intent. Experimental results demonstrate that CoDe outperforms baseline algorithms in three MARL benchmarks without delay and exhibits robustness under fixed and time-varying delays.

AAMAS Conference 2025 Conference Paper

Enhancing Offline Safe Reinforcement Learning with Trajectory-Constrained Diffusion Planning

  • Hengrui Zhang
  • Youfang Lin
  • Shuo Shen
  • Hanfeng Lin
  • Peng Cheng
  • Sheng Han
  • Kai Lv

Recent approaches have utilized the RL via Supervised Learning (RvS) framework to model offline safe RL. However, these methods overlook the fundamental differences between reward maximization and constraint satisfaction, treating them identically with guidance sampling, and requiring different hyperparameters for different constraint conditions. To address these limitations, we propose a novel framework, the Trajectory-Constrained Diffusion Planner (TCDP), which reframes offline safe RL as a product of trajectory conditional probabilities and energy functions. Additionally, we introduce Cost-returns-To-Go relabeling with Data Augmentation (CTGDA) and the Quantile Normalization (QN) technique, enabling the adaptation to various constraints without retraining or extensive hyperparameter adjustments.

IJCAI Conference 2025 Conference Paper

From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination

  • Chang Yao
  • Youfang Lin
  • Shoucheng Song
  • Hao Wu
  • Yuqing Ma
  • Sheng Han
  • Kai Lv

Continual Multi-Agent Reinforcement Learning (Co-MARL) requires agents to address catastrophic forgetting issues while learning new coordination policies with the dynamics team. In this paper, we delve into the core of Co-MARL, namely Relation Patterns, which refer to agents’ general understanding of interactions. In addition to generality, relation patterns exhibit task-specificity when mapped to different action spaces. To this end, we propose a novel method called General Relation Patterns-Guided Task-specific Decision-Maker (RPG). In RPG, agents extract relation patterns from dynamic observation spaces using a relation capturer. These task-agnostic relation patterns are then mapped to different action spaces via a task-specific decision-maker generated by a conditional hypernetwork. To combat forgetting, we further introduce regularization items on both the relation capturer and the conditional hypernetwork. Results on SMAC and LBF demonstrate that RPG effectively prevents catastrophic forgetting when learning new tasks and achieves zero-shot generalization to unseen tasks.

AAAI Conference 2025 Conference Paper

Infer the Whole from a Glimpse of a Part: Keypoint-Based Knowledge Graph for Vehicle Re-Identification

  • Kai Lv
  • Yunlong Li
  • Zhuo Chen
  • Shuo Wang
  • Sheng Han
  • Youfang Lin

Vehicle re-identification aims to match vehicles across non-overlapping camera views. Many existing methods extract features from one specific image, and these methods lack view-invariance when comparing vehicles of different orientations. As a result, discriminative parts obscured by viewpoint changes cannot contribute effectively to matching. This work presents a novel keypoint-based framework for vehicle Re-ID. We propose to explicitly model the intrinsic structural relationships between vehicle components via knowledge graph. By establishing connection between keypoints, our approach aims to leverage such prior to match vehicles even when some parts are not directly comparable due to orientation inconsistencies. Specifically, given query and gallery images, we first detect visible keypoints. Then, a transformer-based model infers features for non-overlapped keypoints by conditioning on visible correspondences defined in the knowledge graph. The final representation integrates visible and inferred features. Extensive experiments demonstrate our method outperforms state-of-the-arts on standard benchmarks under cross-view matching scenarios. To our knowledge, this is the first work introducing structural priors via keypoint knowledge graphs for view-invariant vehicle re-identification.

AAAI Conference 2024 Conference Paper

Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation

  • Hengrui Zhang
  • Youfang Lin
  • Shuo Shen
  • Sheng Han
  • Kai Lv

In the domain of real-world agents, the application of Reinforcement Learning (RL) remains challenging due to the necessity for safety constraints. Previously, Constrained Reinforcement Learning (CRL) has predominantly focused on on-policy algorithms. Although these algorithms exhibit a degree of efficacy, their interactivity efficiency in real-world settings is sub-optimal, highlighting the demand for more efficient off-policy methods. However, off-policy CRL algorithms grapple with challenges in precise estimation of the C-function, particularly due to the fluctuations in the constrained Lagrange multiplier. Addressing this gap, our study focuses on the nuances of C-value estimation in off-policy CRL and introduces the Adaptive Ensemble C-learning (AEC) approach to reduce these inaccuracies. Building on state-of-the-art off-policy algorithms, we propose AEC-based CRL algorithms designed for enhanced task optimization. Extensive experiments on nine constrained robotics tasks reveal the superior interaction efficiency and performance of our algorithms in comparison to preceding methods.