Author name cluster

Buqing Nie

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

AAAI Conference 2026 Conference Paper

Coordinated Humanoid Robot Locomotion with Symmetry Equivariant Reinforcement Learning Policy

Buqing Nie
Yang Zhang
Rongjun Jin
Zhanxiang Cao
Huangxuan Lin
Xiaokang Yang
Yue Gao

The human nervous system exhibits bilateral symmetry, enabling coordinated and balanced movements. However, existing Deep Reinforcement Learning (DRL) methods for humanoid robots neglect morphological symmetry of the robot, leading to uncoordinated and suboptimal behaviors. Inspired by human motor control, we propose Symmetry Equivariant Policy (SE-Policy), a new DRL framework that embeds strict symmetry equivariance in the actor and symmetry invariance in the critic without additional hyperparameters. SE-Policy enforces consistent behaviors across symmetric observations, producing temporally and spatially coordinated motions with higher task performance. Extensive experiments on velocity tracking tasks, conducted in both simulation and real-world deployment with the Unitree G1 humanoid robot, demonstrate that SE-Policy improves tracking accuracy by up to 40% compared to state-of-the-art baselines, while achieving superior spatial-temporal coordination. These results demonstrate the effectiveness of SE-Policy and its broad applicability to humanoid robots.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Keep On Going: Learning Robust Humanoid Motion Skills via Selective Adversarial Training

Yang Zhang
Zhanxiang Cao
Buqing Nie
Haoyang Li
Zhong Jiangwei
Qiao Sun
Xiaoyi Hu
Xiaokang Yang

Humanoid robots are expected to operate reliably over long horizons while executing versatile whole-body skills. Yet Reinforcement Learning (RL) motion policies typically lose stability under prolonged operation, sensor/actuator noise, and real world disturbances. In this work, we propose a Selective Adversarial Attack for Robust Training (SA2RT) to enhance the robustness of motion skills. The adversary is learned to identify and sparsely perturb the most vulnerable states and actions under an attack-budget constraint, thereby exposing true weakness without inducing conservative overfitting. The resulting non-zero sum, alternating optimization continually strengthens the motion policy against the strongest discovered attacks. We validate our approach on the Unitree G1 humanoid robot across perceptive locomotion and whole-body control tasks. Experimental results show that adversarially trained policies improve the terrain traversal success rate by 40%, reduce the trajectory tracking error by 32%, and maintain long horizon mobility and tracking performance. Together, these results demonstrate that selective adversarial attacks are an effective driver for learning robust, long horizon humanoid motion skills.

PDF Details DOI

IROS Conference 2025 Conference Paper

Minimizing Acoustic Noise: Enhancing Quiet Locomotion for Quadruped Robots in Indoor Applications

Zhanxiang Cao
Buqing Nie
Yang Zhang
Yue Gao 0005

Recent advancements in quadruped robot research have significantly improved their ability to traverse complex and unstructured outdoor environments. However, the issue of noise generated during locomotion is generally overlooked, which is critically important in noise-sensitive indoor environments, such as service and healthcare settings, where maintaining low noise levels is essential. This study aims to optimize the acoustic noise generated by quadruped robots during locomotion through the development of advanced motion control algorithms. To achieve this, we propose a novel approach that minimizes noise emissions by integrating optimized gait design with tailored control strategies. This method achieves an average noise reduction of approximately 8 dBA during movement, thereby enhancing the suitability of quadruped robots for deployment in noise-sensitive indoor environments. Experimental results demonstrate the effectiveness of this approach across various indoor settings, highlighting the potential of quadruped robots for quiet operation in noise-sensitive environments.

Details

ICLR Conference 2025 Conference Paper

Select before Act: Spatially Decoupled Action Repetition for Continuous Control

Buqing Nie
Yangqing Fu
Yue Gao 0005

Reinforcement Learning (RL) has achieved remarkable success in various continuous control tasks, such as robot manipulation and locomotion. Different to mainstream RL which makes decisions at individual steps, recent studies have incorporated action repetition into RL, achieving enhanced action persistence with improved sample efficiency and superior performance. However, existing methods treat all action dimensions as a whole during repetition, ignoring variations among them. This constraint leads to inflexibility in decisions, which reduces policy agility with inferior effectiveness. In this work, we propose a novel repetition framework called SDAR, which implements Spatially Decoupled Action Repetition through performing closed-loop act-or-repeat selection for each action dimension individually. SDAR achieves more flexible repetition strategies, leading to an improved balance between action persistence and diversity. Compared to existing repetition frameworks, SDAR is more sample efficient with higher policy performance and reduced action fluctuation. Experiments are conducted on various continuous control scenarios, demonstrating the effectiveness of spatially decoupled repetition design proposed in this work.

Details

AAAI Conference 2024 Conference Paper

Improve Robustness of Reinforcement Learning against Observation Perturbations via l∞ Lipschitz Policy Networks

Buqing Nie
Jingtian Ji
Yangqing Fu
Yue Gao

Deep Reinforcement Learning (DRL) has achieved remarkable advances in sequential decision tasks. However, recent works have revealed that DRL agents are susceptible to slight perturbations in observations. This vulnerability raises concerns regarding the effectiveness and robustness of deploying such agents in real-world applications. In this work, we propose a novel robust reinforcement learning method called SortRL, which improves the robustness of DRL policies against observation perturbations from the perspective of the network architecture. We employ a novel architecture for the policy network that incorporates global $l_\infty$ Lipschitz continuity and provide a convenient method to enhance policy robustness based on the output margin. Besides, a training framework is designed for SortRL, which solves given tasks while maintaining robustness against $l_\infty$ bounded perturbations on the observations. Several experiments are conducted to evaluate the effectiveness of our method, including classic control tasks and video games. The results demonstrate that SortRL achieves state-of-the-art robustness performance against different perturbation strength.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction

Yangqing Fu
Ming Sun
Buqing Nie
Yue Gao

Monte Carlo Tree Search (MCTS) algorithms such as AlphaGo and MuZero have achieved superhuman performance in many challenging tasks. However, the computational complexity of MCTS-based algorithms is influenced by the size of the search space. To address this issue, we propose a novel probability tree state abstraction (PTSA) algorithm to improve the search efficiency of MCTS. A general tree state abstraction with path transitivity is defined. In addition, the probability tree state abstraction is proposed for fewer mistakes during the aggregation step. Furthermore, the theoretical guarantees of the transitivity and aggregation error bound are justified. To evaluate the effectiveness of the PTSA algorithm, we integrate it with state-of-the-art MCTS-based algorithms, such as Sampled MuZero and Gumbel MuZero. Experimental results on different tasks demonstrate that our method can accelerate the training process of state-of-the-art algorithms with 10%-45% search space reduction.

PDF Details

ICRA Conference 2022 Conference Paper

DanceHAT: Generate Stable Dances for Humanoid Robots with Adversarial Training

Buqing Nie
Yue Gao 0005

Music to dance for humanoid robots is an interesting task. Robot dance generation is challenging when considering music pieces, human dancer motions, and robot stability simultaneously. Previous methods rely on human-designed motion library or stability constraints for robot postures. Hence, dance generation for humanoid robots requires expert design, which can be time-consuming across different humanoid platforms. In this work, we propose a novel method called DanceHAT, which generates stable humanoid dances by imitating human dancers with self-learning. DanceHAT is an adversarial training framework, which incorporates similarity loss and stability loss simultaneously. Furthermore, DanceHAT does not require human-designed features or robot model information. Experiments in the simulation environment and on the real robot demonstrate that our model can generate stable, diverse, and human-like dances for humanoid robots automatically. In addition, DanceHAT is a general training approach for robot imitation tasks with stability constraints, thus can be utilized in other humanoid tasks and will be researched in future works.

Details