Arrow Research search

Author name cluster

Donghui Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
1 author row

Possible papers

3

AAAI Conference 2026 Conference Paper

Reliability-Guaranteed and Reward-Seeking Sequence Modeling for Model-Based Offline Reinforcement Learning

  • Shenghong He
  • Chao Yu
  • Qian Lin
  • Yile Liang
  • Donghui Li
  • Xuetao Ding

As a data-driven learning approach, model-based offline reinforcement learning (MORL) aims to learn a policy by exploiting a dynamics model derived from an existing dataset. Applying conservative quantification to the dynamics model, most existing works on MORL generate trajectories that approximate the real data distribution to facilitate policy learning. However, these methods typically overlook the influence of historical information on environmental dynamics, thus generating unreliable trajectories that fail to align with the true data distribution. In this paper, we propose a new MORL algorithm called Reliability-guaranteed and Reward-seeking Transformer (RT). RT can avoid generating unreliable trajectories through the calculation of cumulative reliability of the trajectories, which is a weighted variational distance between the generated trajectory distribution and the true data distribution. Moreover, by sampling candidate actions with high rewards, RT can efficiently generate high-reward trajectories from the existing offline data, thereby further facilitating policy learning. We theoretically prove the performance guarantees of RT in policy learning, and empirically demonstrate its effectiveness against state-of-the-art model-based methods on several offline benchmark tasks and a large-scale industrial dataset from an on-demand food delivery platform.

AAAI Conference 2025 Conference Paper

Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization

  • Zongkai Liu
  • Qian Lin
  • Chao Yu
  • Xiawei Wu
  • Yile Liang
  • Donghui Li
  • Xuetao Ding

Offline Multi-Agent Reinforcement Learning (MARL) is an emerging field that aims to learn optimal multi-agent policies from pre-collected datasets. Compared to single-agent case, multi-agent setting involves a large joint state-action space and coupled behaviors of multiple agents, which bring extra complexity to offline policy optimization. In this work, we revisit the existing offline MARL methods and show that in certain scenarios they can be problematic, leading to uncoordinated behaviors and out-of-distribution (OOD) joint actions. To address these issues, we propose a new offline MARL algorithm, named In-Sample Sequential Policy Optimization (InSPO). InSPO sequentially updates each agent's policy in an in-sample manner, which not only avoids selecting OOD joint actions but also carefully considers teammates' updated policies to enhance coordination. Additionally, by thoroughly exploring low-probability actions in the behavior policy, InSPO can well address the issue of premature convergence to sub-optimal solutions. Theoretically, we prove InSPO guarantees monotonic policy improvement and converges to quantal response equilibrium (QRE). Experimental results demonstrate the effectiveness of our method compared to current state-of-the-art offline MARL methods.

IS Journal 2019 Journal Article

Noncooperative Target Detection of Spacecraft Objects Based on Artificial Bee Colony Algorithm

  • Xinyu Liu
  • Donghui Li
  • Na Dong
  • Wai Hung Ip
  • Kai Leung Yung

Although heuristic algorithms have achieved the state-of-the-art performance for object detection, they have not been demonstrated to be sufficiently accurate and robust for multiobject detection. To address this problem, this article incorporates the concept of species into the artificial bee colony algorithm and proposes a multipeak optimization algorithm named species-based artificial bee colony (SABC). Then, we apply SABC to detect the noncooperative target (NCT) from two aspects: Multicircle detection and multitemplate matching. Experiments are conducted using real cases of “ShenZhou8” and “Apollo 9” space missions as well as the “Chang'e” camera point system developed by the Hong Kong Polytechnic University. Experimental results show that the proposed method is robust to detect NCT under various kinds of noise, weak light, and in-orbit and leads to accurate detection results with less time than other methods.