Arrow Research search

Author name cluster

Kai Lv

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
1 author row

Possible papers

11

AAAI Conference 2026 Conference Paper

Think How Your Teammates Think: Active Inference Can Benefit Decentralized Execution

  • Hao Wu
  • Shoucheng Song
  • Chang Yao
  • Sheng Han
  • Huaiyu Wan
  • Youfang Lin
  • Kai Lv

In multi-agent systems, explicit cognition of teammates' decision logic serves as a critical factor in facilitating coordination. Communication (i.e., "Tell") can assist in the cognitive development process by information dissemination, yet it is inevitably subject to real-world constraints such as noise, latency, and attacks. Therefore, building the understanding of teammates' decisions without communication remains challenging. To address this, we propose a novel non-communication MARL framework that realizes the construction of cognition through local observation-based modeling (i.e., "Think"). Our framework enables agents to model teammates' active inference process. At first, the proposed method produces three teammate portraits: perception-belief-action. Specifically, we model the teammate's decision process as follows: 1) Perception: observing environments; 2) Belief: forming beliefs; 3) Action: making decisions. Then, we selectively integrate the belief portrait into the decision process based on the accuracy and relevance of the perception portrait. This enables the selection of cooperative teammates and facilitates effective collaboration. Extensive experiments on the SMAC, SMACv2, MPE, and GRF benchmarks demonstrate the superior performance of our method.

AAAI Conference 2025 Conference Paper

CoDe: Communication Delay-Tolerant Multi-Agent Collaboration via Dual Alignment of Intent and Timeliness

  • Shoucheng Song
  • Youfang Lin
  • Sheng Han
  • Chang Yao
  • Hao Wu
  • Shuo Wang
  • Kai Lv

Communication has been widely employed to enhance multi-agent collaboration. Previous research has typically assumed delay-free communication, a strong assumption that is challenging to meet in practice. However, real-world agents suffer from channel delays, receiving messages sent at different time points, termed Asynchronous Communication, leading to cognitive biases and breakdowns in collaboration. This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration. To handle the above delays, this paper proposes a novel framework, Communication Delay-Tolerant Multi-Agent Collaboration (CoDe). At first, CoDe learns an intent representation as messages through future action inference, reflecting the stable future behavioral trends of the agents. Then, CoDe devises a dual alignment mechanism of intent and timeliness to strengthen the fusion process of asynchronous messages. In this way, agents can extract the long-term intent of others, even from delayed messages, and selectively utilize the most recent messages that are relevant to their intent. Experimental results demonstrate that CoDe outperforms baseline algorithms in three MARL benchmarks without delay and exhibits robustness under fixed and time-varying delays.

NeurIPS Conference 2025 Conference Paper

Conflict-Aware Knowledge Editing in the Wild: Semantic-Augmented Graph Representation for Unstructured Text

  • Zhange Zhang
  • Zhicheng Geng
  • Yuqing Ma
  • Tianbo Wang
  • Kai Lv
  • Xianglong Liu

Large Language Models (LLMs) have demonstrated broad applications but suffer from issues like hallucinations, erroneous outputs and outdated knowledge. Model editing emerges as an effective solution to refine knowledge in LLMs, yet existing methods typically depend on structured knowledge representations. However, real-world knowledge is primarily embedded within complex, unstructured text. Existing structured knowledge editing approaches face significant challenges when handling the entangled and intricate knowledge present in unstructured text, resulting in issues such as representation ambiguity and editing conflicts. To address these challenges, we propose a Conflict-Aware Knowledge Editing in the Wild (CAKE) framework, the first framework explicitly designed for editing knowledge extracted from wild unstructured text. CAKE comprises two core components: a Semantic-augmented Graph Representation module and a Conflict-aware Knowledge Editing strategy. The Semantic-augmented Graph Representation module enhances knowledge encoding through structural disambiguation, relational enrichment, and semantic diversification. Meanwhile, the Conflict-aware Knowledge Editing strategy utilizes a graph-theoretic coloring algorithm to disentangle conflicted edits by allocating them to orthogonal parameter subspaces, thereby effectively mitigating editing conflicts. Experimental results on the AKEW benchmark demonstrate that CAKE significantly outperforms existing methods, achieving a 15. 43\% improvement in accuracy on llama3 editing tasks. Our framework successfully bridges the gap between unstructured textual knowledge and reliable model editing, enabling more robust and scalable updates for practical LLM applications.

IJCAI Conference 2025 Conference Paper

Continuous Diffusive Prediction Network for Multi-Station Weather Prediction

  • Chujie Xu
  • Yuqing Ma
  • Haoyuan Deng
  • Yajun Gao
  • Yudie Wang
  • Kai Lv
  • Xianglong Liu

Multi-station weather prediction provides weather forecasts for specific geographical locations, playing an important role in various aspects of daily life. Existing methods consider the relationships between individual stations discretely, making it difficult to model the continuous spatiotemporal processes of atmospheric motion, which results in suboptimal prediction outcomes. This paper proposes the Continuous Diffusive Prediction Network (CDPNet) to model the real-world continuous weather change process from discrete station observation data. CDPNet consists of two core modules: the Continuous Calibrated Initialization (CCI) and the Diffusive Difference Estimation (DDE). The CCI module interpolates data between observation stations to construct a spatially continuous physical field and ensures temporal continuity by integrating directional information from a global perspective. It accurately represents the current physical state and provides a foundation for future weather prediction. Moreover, the DDE module explicitly captures the spatial diffusion process and estimates the diffusive differences between consecutive time steps, effectively modeling spatio-temporally continuous atmospheric motion. Likewise, directional information on weather changes is introduced from the entire historical series to mitigate estimation uncertainty and improve the performance of weather prediction. Extensive experiments on the Weather2K and Global Wind/Temp datasets demonstrate that CDPNet outperforms state-of-the-art models.

AAMAS Conference 2025 Conference Paper

Enhancing Offline Safe Reinforcement Learning with Trajectory-Constrained Diffusion Planning

  • Hengrui Zhang
  • Youfang Lin
  • Shuo Shen
  • Hanfeng Lin
  • Peng Cheng
  • Sheng Han
  • Kai Lv

Recent approaches have utilized the RL via Supervised Learning (RvS) framework to model offline safe RL. However, these methods overlook the fundamental differences between reward maximization and constraint satisfaction, treating them identically with guidance sampling, and requiring different hyperparameters for different constraint conditions. To address these limitations, we propose a novel framework, the Trajectory-Constrained Diffusion Planner (TCDP), which reframes offline safe RL as a product of trajectory conditional probabilities and energy functions. Additionally, we introduce Cost-returns-To-Go relabeling with Data Augmentation (CTGDA) and the Quantile Normalization (QN) technique, enabling the adaptation to various constraints without retraining or extensive hyperparameter adjustments.

IJCAI Conference 2025 Conference Paper

From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination

  • Chang Yao
  • Youfang Lin
  • Shoucheng Song
  • Hao Wu
  • Yuqing Ma
  • Sheng Han
  • Kai Lv

Continual Multi-Agent Reinforcement Learning (Co-MARL) requires agents to address catastrophic forgetting issues while learning new coordination policies with the dynamics team. In this paper, we delve into the core of Co-MARL, namely Relation Patterns, which refer to agents’ general understanding of interactions. In addition to generality, relation patterns exhibit task-specificity when mapped to different action spaces. To this end, we propose a novel method called General Relation Patterns-Guided Task-specific Decision-Maker (RPG). In RPG, agents extract relation patterns from dynamic observation spaces using a relation capturer. These task-agnostic relation patterns are then mapped to different action spaces via a task-specific decision-maker generated by a conditional hypernetwork. To combat forgetting, we further introduce regularization items on both the relation capturer and the conditional hypernetwork. Results on SMAC and LBF demonstrate that RPG effectively prevents catastrophic forgetting when learning new tasks and achieves zero-shot generalization to unseen tasks.

AAAI Conference 2025 Conference Paper

Infer the Whole from a Glimpse of a Part: Keypoint-Based Knowledge Graph for Vehicle Re-Identification

  • Kai Lv
  • Yunlong Li
  • Zhuo Chen
  • Shuo Wang
  • Sheng Han
  • Youfang Lin

Vehicle re-identification aims to match vehicles across non-overlapping camera views. Many existing methods extract features from one specific image, and these methods lack view-invariance when comparing vehicles of different orientations. As a result, discriminative parts obscured by viewpoint changes cannot contribute effectively to matching. This work presents a novel keypoint-based framework for vehicle Re-ID. We propose to explicitly model the intrinsic structural relationships between vehicle components via knowledge graph. By establishing connection between keypoints, our approach aims to leverage such prior to match vehicles even when some parts are not directly comparable due to orientation inconsistencies. Specifically, given query and gallery images, we first detect visible keypoints. Then, a transformer-based model infers features for non-overlapped keypoints by conditioning on visible correspondences defined in the knowledge graph. The final representation integrates visible and inferred features. Extensive experiments demonstrate our method outperforms state-of-the-arts on standard benchmarks under cross-view matching scenarios. To our knowledge, this is the first work introducing structural priors via keypoint knowledge graphs for view-invariant vehicle re-identification.

AAAI Conference 2024 Conference Paper

Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation

  • Hengrui Zhang
  • Youfang Lin
  • Shuo Shen
  • Sheng Han
  • Kai Lv

In the domain of real-world agents, the application of Reinforcement Learning (RL) remains challenging due to the necessity for safety constraints. Previously, Constrained Reinforcement Learning (CRL) has predominantly focused on on-policy algorithms. Although these algorithms exhibit a degree of efficacy, their interactivity efficiency in real-world settings is sub-optimal, highlighting the demand for more efficient off-policy methods. However, off-policy CRL algorithms grapple with challenges in precise estimation of the C-function, particularly due to the fluctuations in the constrained Lagrange multiplier. Addressing this gap, our study focuses on the nuances of C-value estimation in off-policy CRL and introduces the Adaptive Ensemble C-learning (AEC) approach to reduce these inaccuracies. Building on state-of-the-art off-policy algorithms, we propose AEC-based CRL algorithms designed for enhanced task optimization. Extensive experiments on nine constrained robotics tasks reveal the superior interaction efficiency and performance of our algorithms in comparison to preceding methods.

IJCAI Conference 2024 Conference Paper

How to Learn Domain-Invariant Representations for Visual Reinforcement Learning: An Information-Theoretical Perspective

  • Shuo Wang
  • Zhihao Wu
  • Jinwen Wang
  • Xiaobo Hu
  • Youfang Lin
  • Kai Lv

Despite the impressive success in visual control challenges, Visual Reinforcement Learning (VRL) policies have struggled to generalize to other scenarios. Existing works attempt to empirically improve the generalization capability, lacking theoretical support. In this work, we explore how to learn domain-invariant representations for VRL from an information-theoretical perspective. Specifically, we identify three Mutual Information (MI) terms. These terms highlight that a robust representation should preserve domain invariant information (return and dynamic transition) under significant observation perturbation. Furthermore, we relax the MI terms to derive three components for implementing a practical Mutual Information-based Invariant Representation (MIIR) algorithm for VRL. Extensive experiments demonstrate that MIIR achieves state-of-the-art generalization performance and the best sample efficiency in the DeepMind Control suite, Robotic Manipulation, and Carla.

AAAI Conference 2024 Conference Paper

What Effects the Generalization in Visual Reinforcement Learning: Policy Consistency with Truncated Return Prediction

  • Shuo Wang
  • Zhihao Wu
  • Xiaobo Hu
  • Jinwen Wang
  • Youfang Lin
  • Kai Lv

In visual Reinforcement Learning (RL), the challenge of generalization to new environments is paramount. This study pioneers a theoretical analysis of visual RL generalization, establishing an upper bound on the generalization objective, encompassing policy divergence and Bellman error components. Motivated by this analysis, we propose maintaining the cross-domain consistency for each policy in the policy space, which can reduce the divergence of the learned policy during the test. In practice, we introduce the Truncated Return Prediction (TRP) task, promoting cross-domain policy consistency by predicting truncated returns of historical trajectories. Moreover, we also propose a Transformer-based predictor for this auxiliary task. Extensive experiments on DeepMind Control Suite and Robotic Manipulation tasks demonstrate that TRP achieves state-of-the-art generalization performance. We further demonstrate that TRP outperforms previous methods in terms of sample efficiency during training.

NeurIPS Conference 2022 Conference Paper

CoNT: Contrastive Neural Text Generation

  • Chenxin An
  • Jiangtao Feng
  • Kai Lv
  • Lingpeng Kong
  • Xipeng Qiu
  • Xuanjing Huang

Recently, contrastive learning attracts increasing interests in neural text generation as a new solution to alleviate the exposure bias problem. It introduces a sequence-level training signal which is crucial to generation tasks that always rely on auto-regressive decoding. However, previous methods using contrastive learning in neural text generation usually lead to inferior performance. In this paper, we analyse the underlying reasons and propose a new Contrastive Neural Text generation framework, CoNT. CoNT addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects -- the construction of contrastive examples, the choice of the contrastive loss, and the strategy in decoding. We validate CoNT on five generation tasks with ten benchmarks, including machine translation, summarization, code comment generation, data-to-text generation and commonsense generation. Experimental results show that CoNT clearly outperforms its baseline on all the ten benchmarks with a convincing margin. Especially, CoNT surpasses previous the most competitive contrastive learning method for text generation, by 1. 50 BLEU on machine translation and 1. 77 ROUGE-1 on summarization, respectively. It achieves new state-of-the-art on summarization, code comment generation (without external data) and data-to-text generation.