Author name cluster

Zhaopeng Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

1 author row

IROS Conference 2025 Conference Paper

ContactDexNet: Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-Object Contact Semantic Mapping

Lei Zhang 0035
Kaixin Bai
Guowen Huang
Zhenshan Bing
Zhaopeng Chen
Alois C. Knoll
Jianwei Zhang 0001

The deep learning models has significantly advanced dexterous manipulation techniques for multi-fingered hand grasping. However, the contact information-guided grasping in cluttered environments remains largely underexplored. To address this gap, we have developed ContactDexNet, a method for generating multi-fingered hand grasp samples in cluttered settings through contact semantic map. We introduce a contact semantic conditional variational autoencoder network (CoSe-CVAE) for creating comprehensive contact semantic map from object point cloud. We utilize grasp detection method to estimate hand grasp poses from the contact semantic map. Finally, an unified grasp evaluation model PointNetGPD++ is designed to assess grasp quality and collision probability, substantially improving the reliability of identifying optimal grasps in cluttered scenarios. Our grasp generation method has demonstrated remarkable success, outperforming state-of-the-art (SOTA) methods by at least 4. 7%, with 81. 0% average grasping success rate in real-world single-object grasping using a known hand, and by at least 9. 0% when using an unknown hand. Moreover, in cluttered scenes, our method attains a 76. 7% success rate, outperforming the SOTA method by 6. 3%. We also proposed the multi-modal multi-fingered grasping dataset generation method. Our multi-fingered hand grasping dataset outperforms previous datasets in scene diversity, modality diversity. More details and supplementary materials can be found at https://sites.google.com/view/contact-dexnet.

ICRA Conference 2025 Conference Paper

Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Manipulation

Hang Li
Qian Feng
Zhi Zheng
Jianxiang Feng
Zhaopeng Chen
Alois C. Knoll

Learning from demonstrations faces challenges in generalizing beyond the training data and often lacks collision awareness. This paper introduces Lan-o3dp, a language-guided object-centric diffusion policy framework that can adapt to unseen situations such as cluttered scenes, shifting camera views, and ambiguous similar objects while offering trainingfree collision avoidance and achieving a high success rate with few demonstrations. We train a diffusion model conditioned on 3D point clouds of task-relevant objects to predict the robot's end-effector trajectories, enabling it to complete the tasks. During inference, we incorporate cost optimization into denoising steps to guide the generated trajectory to be collisionfree. We leverage open-set segmentation to obtain the 3D point clouds of related objects. We use a large language model to identify the target objects and possible obstacles by interpreting the user's natural language instructions. To effectively guide the conditional diffusion model using a time-independent cost function, we proposed a novel guided generation mechanism based on the estimated clean trajectories. In the simulation, we showed that diffusion policy based on the object-centric 3D representation achieves a much higher success rate (68. 7%) compared to baselines with simple 2D (39. 3%) and 3D scene (43. 6%) representations across 21 challenging RLBench tasks with only 40 demonstrations. In real-world experiments, we extensively evaluated the generalization in various unseen situations and validated the effectiveness of the proposed zeroshot cost-guided collision avoidance.

IROS Conference 2025 Conference Paper

LensDFF: Language-enhanced Sparse Feature Distillation for Efficient Few-Shot Dexterous Manipulation

Qian Feng
Alois C. Knoll
David S. Martinez Lema
Zhaopeng Chen
Jianxiang Feng

Learning dexterous manipulation from few-shot demonstrations is a significant yet challenging problem for advanced, human-like robotic systems. Dense distilled feature fields have addressed this challenge by distilling rich semantic features from 2D visual foundation models into the 3D domain. However, their reliance on neural rendering models such as Neural Radiance Fields (NeRF) or Gaussian Splatting results in high computational costs. In contrast, previous approaches based on sparse feature fields either suffer from inefficiencies due to multi-view dependencies and extensive training or lack sufficient grasp dexterity. To overcome these limitations, we propose Language-ENhanced Sparse Distilled Feature Field (LensDFF), which efficiently distills view-consistent 2D features onto 3D points using our novel language-enhanced feature fusion strategy, thereby enabling single-view few-shot generalization. Based on LensDFF, we further introduce a few-shot dexterous manipulation framework that integrates grasp primitives into the demonstrations to generate stable and highly dexterous grasps. Moreover, we present a real2sim grasp evaluation pipeline for efficient grasp assessment and hyperparameter tuning. Through extensive simulation experiments based on the real2sim pipeline and real-world experiments, our approach achieves competitive grasping performance, outperforming state-of-the-art approaches. See our website for the code and videos: david-s-martinez.github.io/LensDFF.

ICRA Conference 2024 Conference Paper

A Collision-Aware Cable Grasping Method in Cluttered Environment

Lei Zhang 0198
Kaixin Bai
Qiang Li 0001
Zhaopeng Chen
Jianwei Zhang 0001

We introduce a Cable Grasping-Convolutional Neural Network (CG-CNN) designed to facilitate robust cable grasping in cluttered environments. Utilizing physics simulations, we generate an extensive dataset that mimics the intricacies of cable grasping, factoring in potential collisions between cables and robotic grippers. We employ the Approximate Convex Decomposition technique to dissect the non-convex cable model, with grasp quality autonomously labeled based on simulated grasping attempts. The CG-CNN is refined using this simulated dataset and enhanced through domain randomization techniques. Subsequently, the trained model predicts grasp quality, guiding the optimal grasp pose to the robot’s controller for execution. Grasping efficacy is assessed across both synthetic and real-world settings. Given our model’s implicit collision sensitivity, we achieved commendable success rates of 92. 3% for known cables and 88. 4% for unknown cables, surpassing contemporary state-of-the-art approaches. Supplementary materials can be found at https://leizhang-public.github.io/cg-cnn/.

ICRA Conference 2024 Conference Paper

Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

Kaixin Bai
Lei Zhang 0198
Zhaopeng Chen
Fang Wan
Jianwei Zhang 0001

Despite the substantial progress in deep learning, its adoption in industrial robotics projects remains limited, primarily due to challenges in data acquisition and labeling. Previous sim2real approaches using domain randomization require extensive scene and model optimization. To address these issues, we introduce an innovative physically-based structured light simulation system, generating both RGB and physically realistic depth images, surpassing previous dataset generation tools. We create an RGBD dataset tailored for robotic industrial grasping scenarios and evaluate it across various tasks, including object detection, instance segmentation, and embedding sim2real visual perception in industrial robotic grasping. By reducing the sim2real gap and enhancing deep learning training, we facilitate the application of deep learning models in industrial settings. Project details are available at https://baikaixin-public.github.io/structured_light_3D_synthesizer/

IROS Conference 2024 Conference Paper

ToolEENet: Tool Affordance 6D Pose Estimation

Yunlong Wang
Lei Zhang 0198
Yuyang Tu
Hui Zhang 0070
Kaixin Bai
Zhaopeng Chen
Jianwei Zhang 0001

The exploration of robotic dexterous hands utilizing tools has recently attracted considerable attention. A significant challenge in this field is the precise awareness of a tool’s pose when grasped, as occlusion by the hand often degrades the quality of the estimation. Additionally, the tool’s overall pose often fails to accurately represent the contact interaction, thereby limiting the effectiveness of vision-guided, contact-dependent activities. To overcome this limitation, we present the innovative TOOLEE dataset, which, to the best of our knowledge, is the first to feature affordance segmentation of a tool’s end-effector (EE) along with its defined 6D pose based on its usage. Furthermore, we propose the ToolEENet framework for accurate 6D pose estimation of the tool’s EE. This framework begins by segmenting the tool’s EE from raw RGB-D data, then uses a diffusion model-based pose estimator for 6D pose estimation at a category-specific level. Addressing the issue of symmetry in pose estimation, we introduce a symmetry-aware pose representation that enhances the consistency of pose estimation. Our approach excels in this field, demonstrating high levels of precision and generalization. Furthermore, it shows great promise for application in contact-based manipulation scenarios. All data and codes are available on the project website: https://tooleenet-iros2024.github.io/

ICRA Conference 2022 Conference Paper

FFHNet: Generating Multi-Fingered Robotic Grasps for Unknown Objects in Real-time

Vincent Mayer
Qian Feng
Jun Deng
Yunlei Shi
Zhaopeng Chen
Alois C. Knoll

Grasping unknown objects with multi-fingered hands at high success rates and in real-time is an unsolved problem. Existing methods are limited in the speed of grasp synthesis or the ability to synthesize a variety of grasps from the same observation. We introduce Five-finger Hand Net (FFHNet), an ML model which can generate a wide variety of high-quality multi-fingered grasps for unseen objects from a single view. Generating and evaluating grasps with FFHNet takes only 30ms on a commodity GPU. To the best of our knowledge, FFHNet is the first ML-based real-time system for multi-fingered grasping with the ability to perform grasp inference at 30 frames per second (FPS). For training, we synthetically generate 180k grasp samples for 129 objects. We are able to achieve 91% grasping success for unknown objects in simulation and we demonstrate the model's capabilities of synthesizing high-quality grasps also for real unseen objects.

IROS Conference 2022 Conference Paper

Learning 6-DoF Task-oriented Grasp Detection via Implicit Estimation and Visual Affordance

Wenkai Chen
Hongzhuo Liang
Zhaopeng Chen
Fuchun Sun 0001
Jianwei Zhang 0001

Currently, task-oriented grasp detection approaches are mostly based on pixel-level affordance detection and semantic segmentation. These pixel-level approaches heavily rely on the accuracy of a 2D affordance mask, and the generated grasp candidates are restricted to a small workspace. To mitigate these limitations, we firstly construct a novel affordance-based grasp dataset and propose a 6-DoF task-oriented grasp detection framework, which takes the observed object point cloud as input and predicts diverse 6-DoF grasp poses for different tasks. Specifically, our implicit estimation network and visual affordance network in this framework could directly predict coarse grasp candidates, and corresponding 3D affordance heatmap for each potential task, respectively. Furthermore, the grasping scores from coarse grasps are combined with heatmap values to generate more accurate and finer candidates. Our proposed framework shows significant improvements compared to baselines for existing and novel objects on our simulation dataset. Although our framework is trained based on the simulated objects and environment, the final generated grasp candidates can be accurately and stably executed in the real robot experiments when the object is randomly placed on a support surface.

IROS Conference 2021 Conference Paper

Combining Learning from Demonstration with Learning by Exploration to Facilitate Contact-Rich Tasks

Yunlei Shi
Zhaopeng Chen
Yansong Wu
Dimitri Henkel
Sebastian Riedel 0002
Hongxu Liu
Qian Feng
Jianwei Zhang 0001

Collaborative robots are expected to work alongside humans and directly replace human workers in some cases, thus effectively responding to rapid changes in assembly lines. Current methods for programming contact-rich tasks, particularly in heavily constrained spaces, tend to be fairly inefficient. Therefore, faster and more intuitive approaches are urgently required for robot teaching. This study focuses on combining visual servoing-based learning from demonstration (LfD) and force-based learning by exploration (LbE) to enable the fast and intuitive programming of contact-rich tasks with minimal user efforts. Two learning approaches were developed and integrated into a framework, one relying on human-to-robot motion mapping (visual servoing approach) and the other relying on force-based reinforcement learning. The developed framework implements the noncontact demonstration teaching method based on the visual servoing approach and optimizes the demonstrated robot target positions according to the detected contact state. The developed framework is compared with two most commonly used baseline techniques, i. e. , teach pendant-based teaching and hand-guiding teaching. Furthermore, the efficiency and reliability of the framework are validated via comparison experiments involving the teaching and execution of contact-rich tasks. The proposed framework shows the best performance in terms of the teaching time, execution success rate, risk of damage, and ease of use.

IROS Conference 2021 Conference Paper

Learning compliant grasping and manipulation by teleoperation with adaptive force control

Chao Zeng 0002
Shuang Li 0014
Yiming Jiang 0001
Qiang Li 0001
Zhaopeng Chen
Chenguang Yang 0001
Jianwei Zhang 0001

In this work, we focus on improving the robot’s dexterous capability by exploiting visual sensing and adaptive force control. TeachNet, a vision-based teleoperation learning framework, is exploited to map human hand postures to a multi-fingered robot hand. We augment TeachNet, which is originally based on an imprecise kinematic mapping and position-only servoing, with a biomimetic learning-based compliance control algorithm for dexterous manipulation tasks. This compliance controller takes the mapped robotic joint angles from TeachNet as the desired goal, computes the desired joint torques. It is derived from a computational model of the biomimetic control strategy in human motor learning, which allows adapting the control variables (impedance and feedforward force) online during the execution of the reference joint angle trajectories. The simultaneous adaptation of the impedance and feedforward profiles enables the robot to interact with the environment in a compliant manner. Our approach has been verified in multiple tasks in physics simulation, i. e. , grasping, opening-a-door, turning-a-cap, and touching-a-mouse, and has shown more reliable performances than the existing position control and the fixed-gain-based force control approaches.

ICRA Conference 2021 Conference Paper

Proactive Action Visual Residual Reinforcement Learning for Contact-Rich Tasks Using a Torque-Controlled Robot

Yunlei Shi
Zhaopeng Chen
Hongxu Liu
Sebastian Riedel 0002
Chunhui Gao
Qian Feng
Jun Deng
Jianwei Zhang 0001

Contact-rich manipulation tasks are commonly found in modern manufacturing settings. However, manually designing a robot controller is considered hard for traditional control methods as the controller requires an effective combination of modalities and vastly different characteristics. In this paper, we first consider incorporating operational space visual and haptic information into a reinforcement learning (RL) method to solve the target uncertainty problems in unstructured environments. Moreover, we propose a novel idea of introducing a proactive action to solve a partially observable Markov decision process (POMDP) problem. With these two ideas, our method can either adapt to reasonable variations in unstructured environments or improve the sample efficiency of policy learning. We evaluated our method on a task that involved inserting a random-access memory (RAM) using a torque-controlled robot and tested the success rates of different baselines used in the traditional methods. We proved that our method is robust and can tolerate environmental variations.

ICRA Conference 2020 Conference Paper

Center-of-Mass-based Robust Grasp Planning for Unknown Objects Using Tactile-Visual Sensors

Qian Feng
Zhaopeng Chen
Jun Deng
Chunhui Gao
Jianwei Zhang 0001
Alois C. Knoll

An unstable grasp pose can lead to slip, thus an unstable grasp pose can be predicted by slip detection. A regrasp is required afterwards to correct the grasp pose in order to finish the task. In this work, we propose a novel regrasp planner with multi-sensor modules to plan grasp adjustments with the feedback from a slip detector. Then a regrasp planner is trained to estimate the location of center of mass, which helps robots find an optimal grasp pose. The dataset in this work consists of 1 025 slip experiments and 1 347 regrasps collected by one pair of tactile sensors, an RGB-D camera and one Franka Emika robot arm equipped with joint force/torque sensors. We show that our algorithm can successfully detect and classify the slip for 5 unknown test objects with an accuracy of 76. 88% and a regrasp planner increases the grasp success rate by 31. 0% compared to the state-of-the-art vision-based grasping algorithm.

ICRA Conference 2019 Conference Paper

Reconstructing Human Hand Pose and Configuration using a Fixed-Base Exoskeleton

Aaron Pereira
Georg Stillfried
Thomas Baker
Annika Schmidt
Annika Maier
Benedikt Pleintinger
Zhaopeng Chen
Thomas Hulin

Accurate real-time estimation of the pose and configuration of the human hand attached to a dexterous haptic input device is crucial to improve the interaction possibilities for teleoperation and in virtual and augmented reality. In this paper, we present an approach to reconstruct the pose of the human hand and the joint angles of the fingers when wearing a novel fixed-base (grounded) hand exoskeleton. Using a kinematic model of the human hand built from MRI data, we can reconstruct the hand pose and joint angles without sensors on the human hand, from attachment points on the first three fingers and the palm. We test the accuracy of our approach using motion capture as a ground truth. This reconstruction can be used to determine contact geometry and force-feedback from virtual or remote objects in virtual reality or teleoperation.

ICRA Conference 2015 Conference Paper

An adaptive compliant multi-finger approach-to-grasp strategy for objects with position uncertainties

Zhaopeng Chen
Thomas Wimböck
Máximo A. Roa
Benedikt Pleintinger
Miguel Neves 0002
Christian Ott 0001
Christoph Borst 0001
Neal Y. Lii

This paper presents an adaptive and compliant approach-to-grasp strategy for multi-finger robotic hands, to improve the performance of autonomous grasping when encountering object position uncertainties. With the proposed approach-to-grasp strategy, the first robot finger to experience unexpected impact would pause its movement in a compliant manner, and remains in contact with the object to minimize the unplanned motion of the target object. At the same time, the remainder of the fingers continuously, adaptively move toward re-adjusted grasping positions with respect to the first finger in contact with the object, without the need for on-line re-planning or re-grasping. An adaptive grasp control strategy based on spatial virtual spring framework is proposed to achieve local (e. g. not resorting to the robotic arm) in-hand adjustments of the fingers not yet in contact. As such, these fingers can be adaptively driven to the adjusted desired position to accomplish the grasp. Experimental results demonstrate that significantly larger position errors with respect to the hand workspace can be accommodated with the proposed adaptive compliant grasp control strategy. As much as 391% increase in position error area coverage has been achieved. Finally, beyond the quantitative analysis, additional observations during the extensive experiment trials are discussed qualitatively, to help examine several open issues, and further understand the approach-to-grasp phases of the robot hand tasks.

ICRA Conference 2014 Conference Paper

Towards a functional evaluation of manipulation performance in dexterous robotic hand design

Máximo A. Roa
Zhaopeng Chen
Irene C. Staal
Jared N. Muirhead
Annika Maier
Benedikt Pleintinger
Christoph Borst 0001
Neal Y. Lii

Dexterous multifingered hands are the most complex and versatile variants of robotic end effectors. Compared to simpler grippers and underactuated hands, they should be more capable of grasping and, especially, manipulating different objects. This paper explores the relationship between kinematic design and manipulation performance of robotic hands. Some evaluation criteria frequently used by hand designers to verify kinematic configurations are revisited. The results from these criteria are scrutinized and compared with the evaluation of the manipulation workspace and the ranges of motion of inhand manipulation for a set of predefined objects. Simulations and actual manipulation experiments are carried out with different kinematic configurations on a modular dexterous hand. The results show some disconnection between perceived good designs through common evaluation criteria and their actual, realizable manipulation performance. This work finally gives some insight toward a more holistic approach to design hands that better address grasp and manipulation for the intended tasks and applications.

IROS Conference 2010 Conference Paper

Experimental study on impedance control for the five-finger dexterous robot hand DLR-HIT II

Zhaopeng Chen
Neal Y. Lii
Thomas Wimböck
Shaowei Fan
Minghe Jin
Christoph Borst 0001
Hong Liu 0002

This paper presents experimental results on the five-finger dexterous robot hand DLR-HIT II, with Cartesian impedance control based on joint torque and nonlinearity compensation for elastic dexterous robot joints. To improve the performance of the impedance controller, system parameter estimations with extended kalman filter and gravity compensation have been investigated on the robot hand. Experimental results show that, for the harmonic drive robot hand with joint toruqe feedback, accurate position tracking and stable torque/force response can be achieved with cartesian and joint impedance controller. In addition, a FPGA-based control architecture with flexible communication is proposed to perform the designed impedance controller.

IROS Conference 2010 Conference Paper

Toward understanding the effects of visual- and force-feedback on robotic hand grasping performance for space teleoperation

Neal Y. Lii
Zhaopeng Chen
Benedikt Pleintinger
Christoph Borst 0001
Gerhard Hirzinger
Andre Schiele

This paper introduces a study aimed to help quantify the benefits of limited-performance force-feedback user input devices for space telemanipulation with a dexterous robotic arm. A teleoperated robotic hand has been developed for the European Space Agency by the German Aerospace Center (DLR) for a lunar rover prototype. Studies carried out on this telerobotic system investigated several criteria critical to telemanipulation in space: (1) grasping task completion time, (2) grasping task difficulty, (3) grasp quality, and (4) difficulty level for the operator to assess the grasp quality. Several test subjects were allocated to remotely grasp regular and irregular shaped objects, under different combinations of visual- and force-feedback conditions. This work categorized the benefits of visual- and force-feedback in teleoperated grasping through several performance metrics. Furthermore, it has been shown that, with local joint-level impedance control, good grasping performance with rigid hard objects can be achieved, even with limited force-feedback information and low communication bandwidth. On the other hand, a performance ceiling was also found when grasping deformable objects, where the limited force-feedback setup cannot sufficiently reflect the object boundary to the teleoperator.

IROS Conference 2008 Conference Paper

Multisensory five-finger dexterous hand: The DLR/HIT Hand II

Hong Liu 0002
Ke Wu
Peter Meusel
Nikolaus Seitz
Gerhard Hirzinger
Minghe Jin
Yiwei Liu 0001
Shaowei Fan

This paper presents a new developed multisensory five-fingered dexterous robot hand: the DLR/HIT Hand II. The hand has an independent palm and five identical modular fingers, each finger has three DOFs and four joints. All the actuators and electronics are integrated in the finger body and the palm. By using powerful super flat brushless DC motors, tiny harmonic drivers and BGA form DSPs and FPGAs, the whole fingerpsilas size is about one third smaller than the former finger in the DLR/HIT Hand I. By using the steel coupling mechanism, the phalanx distalpsilas transmission ratio is exact 1: 1 in the whole movement range. At the same time, the multisensory dexterous hand integrates position, force/torque and temperature sensors. The hierarchical hardware structure of the hand consists of the finger DSPs, the finger FPGAs, the palm FPGA and the PCI based DSP/FPGA board. The hand can communicate with external with PPSeCo, CAN and Internet. Instead of extra cover, the packing mechanism of the hand is implemented directly in the finger body and palm to make the hand smaller and more human like. The whole weight of the hand is about 1. 5Kg and the fingertip force can reach 10N.