Author name cluster

Dengpeng Xing

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

AAAI Conference 2026 Conference Paper

Enhancing Exploration and Exploitation in Hierarchical Reinforcement Learning with Subgoal Graph Learning

Yibo Zhang
Dengpeng Xing

Goal-conditioned hierarchical reinforcement learning has demonstrated effectiveness in addressing complicated decision-making tasks by providing ''temporal extraction'', which decomposes tasks into smaller and more manageable ''subgoals''. This enables agents to plan over a longer time scale. However, achieving optimal exploration and exploitation still remains a challenge, especially for long-horizon or sparse-reward scenarios. In this paper, we introduce Active exploraion and hierarchical Self-Imitation (ASI), an effective scheme to enhance exploration and exploitation based on subgoal representation learning. The key point of ASI is to utilize temporal adjacency information in the representation space. We construct and dynamically update an adjacency graph that captures the relationships between subgoals. Based on the adjacency information provided by the graph, we design two mechanisms: active ``frontier-reaching'' exploration that faster expands the explored area by targeting boundary regions, and hierarchical self-imitation learning that leverages historical experience to facilitate both frontier reaching and policy training. Experimental results show that our method accelerates exploration and outperforms existing baselines in challenging long-horizon continuous control tasks.

PDF Details DOI

ICML Conference 2024 Conference Paper

Learning Causal Dynamics Models in Object-Oriented Environments

Zhongwei Yu
Jingqing Ruan
Dengpeng Xing

Causal dynamics models (CDMs) have demonstrated significant potential in addressing various challenges in reinforcement learning. To learn CDMs, recent studies have performed causal discovery to capture the causal dependencies among environmental variables. However, the learning of CDMs is still confined to small-scale environments due to computational complexity and sample efficiency constraints. This paper aims to extend CDMs to large-scale object-oriented environments, which consist of a multitude of objects classified into different categories. We introduce the Object-Oriented CDM (OOCDM) that shares causalities and parameters among objects belonging to the same class. Furthermore, we propose a learning method for OOCDM that enables it to adapt to a varying number of objects. Experiments on large-scale tasks indicate that OOCDM outperforms existing CDMs in terms of causal discovery, prediction accuracy, generalization, and computational efficiency.

Details

ECAI Conference 2023 Conference Paper

Cardsformer: Grounding Language to Learn a Generalizable Policy in Hearthstone

Wannian Xia
Yiming Yang
Jingqing Ruan
Dengpeng Xing
Bo Xu 0002

Hearthstone is a widely played collectible card game that challenges players to strategize using cards with various effects described in natural language. While human players can easily comprehend card descriptions and make informed decisions, artificial agents struggle to understand the game’s inherent rules and are unable to generalize their policies through natural language. To address this issue, we propose Cardsformer, a method capable of acquiring linguistic knowledge and learning a generalizable policy in Hearthstone. Cardsformer consists of a Prediction Model trained with offline trajectories to predict state transitions based on card descriptions and a Policy Model capable of generalizing its policy on unseen cards. To our knowledge, this is the first work to consider language knowledge in a card game. Experiments show that our approach significantly improves data efficiency and outperforms the state-of-the-art in Hearthstone even when there are untrained cards in the deck, inspiring a new perspective of tackling problems as such with knowledge representation from large language models. As the game constantly releases new cards along with new descriptions and new effects, the challenge in Hearthstone remains. To encourage further research, we make our code publicly available and publish PyStone, the code base of Hearthstone on which we conducted our experiments, as an open benchmark.

Details

IJCAI Conference 2023 Conference Paper

Explainable Reinforcement Learning via a Causal World Model

Zhongwei Yu
Jingqing Ruan
Dengpeng Xing

Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.

PDF Details DOI

IROS Conference 2023 Conference Paper

Generalized Robot Dynamics Learning and Gen2Real Transfer

Dengpeng Xing
Yiming Yang
Zechang Wang
Jiale Li
Bo Xu 0002

Acquiring dynamics is critical for robot learning and is fundamental to planning and control. This paper concerns two fundamental questions: How can we learn a model that covers massive, diverse robot dynamics? Can we construct a model that lifts the data-collection pain and domain expertise required for building specific robot models? We learn the dynamics involved in a dataset containing a large number of serial articulated robots and propose a new concept, “Gen2Real”, to transfer simulated, generalized models to physical, specific robots. We generate a large-scale dataset by randomizing dynamics parameters, topology configurations, and model dimensions, which, in sequence, correspond to different properties, connections, and numbers of robot links. A structure modified from the generative pre-trained transformer is applied to approximate the dynamics of massive heterogeneous robots. In Gen2Real, we transfer the pre-trained model to a target robot using distillation, for the sake of real-time computation. The results demonstrate the superiority of the proposed method in terms of its accuracy in learning a tremendous amount of robot dynamics and its generality to transfer to different robots.

Details

AAMAS Conference 2023 Conference Paper

M3: Modularization for Multi-task and Multi-agent Offline Pre-training

Linghui Meng
Jingqing Ruan
Xuantang Xiong
Xiyun Li
Xi Zhang
Dengpeng Xing
Bo Xu

Learning a multi-task policy is crucial in multi-agent reinforcement learning (MARL). Recent work has focused on learning in the context of online multi-task reinforcement learning, where a policy is jointly trained from scratch, aiming to generalize well to few-shot or even zero-shot tasks. However, existing online methods require tremendous interactions and are therefore unsuitable for environments where interactions are expensive. In this work, we novelly introduce the modularization for multi-task and multi-agent offline pre-training (M3) to learn high-level transferable policy representations. We claim that the discrete policy representation is critical for multi-task offline learning and accordingly leverage contexts as a task prompt to enhance the adaptability of pre-trained models to various tasks. To disentangle multiple agents of variation under heterogeneous and non-stationary properties even though they receive the same task, we employ an agent-invariant VQ-VAE to identify each of the multiple agents. We encapsulate the pretrained model as part of an online MARL algorithm and fine-tune it * These authors contribute equally to this work. † Corresponding authors. Proc. of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023), A. Ricci, W. Yeoh, N. Agmon, B. An (eds.), May 29 – June 2, 2023, London, United Kingdom. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved. to improve generalization. We also theoretically analyze the generalization error of our method. We test the proposed method on the challenging StarCraft Multi-Agent Challenge (SMAC) tasks, and empirical results show that it can achieve supreme performance in few-shot or even zero-shot settings across multiple tasks over state-of-the-art MARL methods.

PDF

ECAI Conference 2023 Conference Paper

Task-Prompt Generalised World Model in Multi-Environment Offline Reinforcement Learning

Xuantang Xiong
Linghui Meng 0001
Jingqing Ruan
Qingyang Zhang 0004
Guoqi Li 0002
Dengpeng Xing
Bo Xu 0002

Offline reinforcement learning (RL) circumvents costly interactions with the environment by utilising historical trajectories. Incorporating a world model into this method could substantially enhance the transfer performance of various tasks without expensive calculations from scratch. However, due to the complexity arising from different types of generalisation, previous works have focused almost exclusively on single-environment tasks. In this study, we introduce a multi-environment offline RL setting to investigate whether a generalised world model can be learned from large, diverse datasets and serve as a good surrogate for policy learning in different tasks. Inspired by the success of multi-task prompt methods, we propose the Task-prompt Generalised World Model (TGW) framework, which demonstrates notable performance in this setting. TGW comprises three modules: a task-state prompter, a generalised dynamics module, and a reward module. We implement the generalised dynamics module as a transformer-based recurrent state-space model and employ prompts to provide task-specific instructions, enabling TGW to address the internal stochasticity of the generalised world model. On the MuJoCo control benchmarks, TGW significantly outperforms previous offline RL algorithms in multi-environment setting.

Details

AAMAS Conference 2022 Conference Paper

GCS: Graph-Based Coordination Strategy for Multi-Agent Reinforcement Learning

Jingqing Ruan
Yali Du
Xuantang Xiong
Dengpeng Xing
Xiyun Li
Linghui Meng
Haifeng Zhang
Jun Wang

Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factorizing the joint team policy into a graph generator and graph-based coordinated policy to enable coordinated behaviours among agents. The graph generator adopts an encoder-decoder framework that outputs directed acyclic graphs (DAGs) to capture the underlying dynamic decision structure. We also apply the DAGness-constrained and DAG depth-constrained optimization in the graph generator to balance efficiency and performance. The graph-based coordinated policy exploits the generated decision structure. The graph generator and coordinated policy are trained simultaneously to maximize the discounted return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative Navigation, and Google Research Football demonstrate the superiority of the proposed method. The code is available at https: //github. com/Amanda-1997/GCS_aamas337.

PDF

ICRA Conference 2022 Conference Paper

Kinematics Learning of Massive Heterogeneous Serial Robots

Dengpeng Xing
Wannian Xia
Bo Xu 0002

Kinematics and instantaneous kinematics are fundamental in many robotic tasks, such as positioning and collision avoidance. Existing learning methods mainly concern a single robot, and small-scale networks are sufficient for considerable approximation accuracy. A question is: Can we learn a kinematics model that can generalize to various robots rather than a single robot? This paper studies the kinematics learning of massive heterogeneous serial robots and the transfer of these general models to reality. We generate a dataset by randomizing dimensions, configurations, and link lengths and employ a network based on the generative pre-trained transformer to learn general kinematics mappings. We directly transfer our models for accuracy and use distillation-based transfer for computational efficiency. The results validate that our method can accurately approximate the kinematics of thousands of robot models and demonstrates generality in transfer.

Details

ICAPS Conference 2022 Conference Paper

Learning Multi-Agent Action Coordination via Electing First-Move Agent

Jingqing Ruan
Linghui Meng 0001
Xuantang Xiong
Dengpeng Xing
Bo Xu 0002

Learning to coordinate actions among agents is essential in complicated multi-agent systems. Prior works are constrained mainly by the assumption that all agents act simultaneously, and asynchronous action coordination between agents is rarely considered. This paper introduces a bi-level multi-agent decision hierarchy for coordinated behavior planning. We propose a novel election mechanism in which we adopt a graph convolutional network to model the interaction among agents and elect a first-move agent for asynchronous guidance. We also propose a dynamically weighted mixing network to effectively reduce the misestimation of the value function during training. This work is the first to explicitly model the asynchronous multi-agent action coordination, and this explicitness enables to choose the optimal first-move agent. The results on Cooperative Navigation and Google Football demonstrate that the proposed algorithm can achieve superior performance in cooperative environments. Our code is available at https: //github. com/Amanda-1997/EFA-DWM.

Details

IROS Conference 2015 Conference Paper

Collision detection for blocking cylindrical objects

Dengpeng Xing
De Xu
Fangfang Liu 0006

This paper proposes two methods for collision detection between cylindrical components when mutual blocking occurs in the view of cameras. In reconstruction approach, 3D models are built according to key features captured by cameras and constraint optimization is employed to rapidly find possible intersections. To satisfy the computational efficiency required by real time operation, we present another way, projection method, to convert two planar views to contours on a projection plane and to detect high dimension collisions by studying the projection's relationships in low dimension. Nine cases are totally categorized and eleven parameters are constructed for detection on the basis of relative postures and positions. Simulations and experiments are carried out to demonstrate their validity.

Details

IROS Conference 2014 Conference Paper

A sequence of micro-assembly for irregular objects based on a multiple manipulator platform

Dengpeng Xing
De Xu
Haipeng Li

Difficulties arise in the micro-assembly of many irregular objects and in the insertion with contact between components of soft materials. To handle these problems, we design a micro-operational platform with multiple manipulators to facilitate a sequence of assembly. Six robot arms and three microscopes are incorporated, together with macro and micro motion systems. We also propose a hybrid control strategy to achieve high precision and protect objects. This hybrid scheme includes vision based positioning controllers for alignment, which employ incremental PI controllers and image Jacobian matrix, force based controllers for insertion, and a decision mechanism determining the assembly state. Experiments demonstrate the effectiveness of the proposed platform and control methods.

Details

ICRA Conference 2014 Conference Paper

Active calibration and its applications on micro-operating platform with multiple manipulators

Dengpeng Xing
De Xu
Haipeng Li
Liyan Luo

The microscope has characteristics of a planar vision with small view field and small view depth. For micro operation systems with multiple manipulators, the handling of irregular objects may lead to a nonorthogonal microscopic system, which needs to focus on clear viewing interested features, and it may also hardly locate the exact position and posture of the robot arms. In view of these, this paper proposes an active calibration method to compute image Jacobian matrix, which maps from the relative motion of the manipulators to the image coordination changes in the microscopes. We also investigate the applications in micro operator positioning, tracking for distributed systems, and movement optimization in micro-assembly. Experiments are carried out on a micro-assembly platform equipped with three microscopes and six robot arms, and the results validate the effectiveness of the proposed method.

Details

ICRA Conference 2012 Conference Paper

Optimal parametric controller for perturbed balance and walking

Dengpeng Xing
Jianbo Su

We present full state feedback controllers for standing and walking balance of humanoid robot. The robot is simulated as a two-joint inverted pendulum for standing and a five-link model for walking, and is disturbed by a horizontal push with given size and location in the sagittal plane. We optimize the parametric controllers for different push sizes, locations, and directions. For standing balance, both impulsive and constant pushes are applied to simulate the hip strategy; for bipedal walking, instantaneous pushes are used as perturbations. The performance of optimized controllers are shown in handling different pushes for standing and walking balance.

Details

IROS Conference 2010 Conference Paper

Gain scheduled control of perturbed standing balance

Dengpeng Xing
Christopher G. Atkeson
Jianbo Su
Benjamin J. Stephens

This paper develops full-state parametric controllers for standing balance of humanoid robots in response to impulsive and constant pushes. We also explore a hypothesis that postural feedback gains in standing balance should change with perturbation size. From an engineering point of view this is known as gain scheduling. We use an optimization approach to see if feedback gains should scale with the perturbation for a simulated robot. We simulate models in the sagittal and lateral plane and in 3-dimensions, use a horizontal push of a given size, direction and location as a perturbation, and optimize parametric controllers for different push sizes, directions and locations. During a simulated perturbation experiment, the appropriate controller is continuously selected based on the current push. For an impulse, the simulated robot recovers back to the initial state; for a constant push, the robot moves to an equilibrium position which leans into the push and has zero joint torques. We show the performance of optimized parametric controllers in response to different external pushes.

Details