Author name cluster

Ronghan Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

ICRA Conference 2025 Conference Paper

DetailRefine: Towards Fine-Grained and Efficient Online Monocular 3D Reconstruction

Fupeng Chu
Yang Cong
Ronghan Chen

Online monocular 3D reconstruction has attracted widespread attention as it promotes the application of robots in interactive scenarios. Most existing methods focus on 1) real-time reconstruction, 2) accurate voxel featuring learning, and 3) effective voxel sparsification algorithm. To this end, 1) they adopt a coarse-to-fine pipeline, where all non-empty voxels are sent to the next level for refinement. However, this results in over-refinement of flat regions, leading to unnecessary computational overhead. Furthermore, 2) advanced methods focus on exploring view visibility but overlook the discriminability among visible views, which limits the representation of learned voxel features. Moreover, 3) existing sparsification algorithms struggle to distinguish detailed and empty voxels, resulting in either the loss of detailed voxels or the retention of empty voxels. To tackle these challenges, 1) we present Dynamic Detail Refinement (DDR) to allocate more voxels to detailed regions for refinement, which could alleviate the computational burden. Furthermore, 2) we propose Discriminability-Aware Fusion (DAF) to focus on discriminative views, which helps to capture accurate voxel features. In addition, 3) we propose Hierarchical Hybrid Sparsification (HHS) to balance global completeness and local refinement, which helps to preserve detailed voxels at hierarchical levels effectively. Extensive experiments conducted on the representative ScanNet (V2) and 7-Scenes datasets demonstrate the superiority of the proposed method.

Details

IROS Conference 2025 Conference Paper

Learning Generalizable 3D Manipulation With 10 Demonstrations

Yu Ren
Yang Cong
Bohao Huang
Jiahao Long
Ronghan Chen
Hongbo Li
Huijie Fan

Learning robust and generalizable manipulation skills from few demonstrations remains a key challenge in robotics, with broad applications in industrial automation and service robotics. Although recent imitation learning methods have achieved impressive results, they often require a large amount of demonstration data and struggle to generalize across different spatial variants. In this work, we propose a framework that learns 3D manipulation policies from only 10 demonstrations while achieving robust generalization to unseen spatial configurations through semantic-guided perception and spatial-equivariant policy learning. Our framework consists of two key modules: a Semantic Guided Perception module that extracts task-aware 3D representations from RGB-D inputs using semantic priors and a Spatial Generalized Decision module implementing a diffusion-based policy that preserves spatial equivariance through denoising. Central to our framework is a spatially equivariant training strategy, which adapts 2D data augmentation principles to 3D manipulation by maintaining gripper-object spatial relationships during trajectory augmentation. We validate our framework through extensive experiments on both simulation benchmarks and real-world robotic systems. Our method demonstrates a significant improvement in success rates over state-of-the-art approaches on a series of challenging tasks, particularly under significant object pose variations. This work shows significant potential to advance efficient and generalizable manipulation skill learning in real-world applications.

Details

ICRA Conference 2024 Conference Paper

Marrying NeRF with Feature Matching for One-step Pose Estimation

Ronghan Chen
Yang Cong
Yu Ren

Given the image collection of an object, we aim at building a real-time image-based pose estimation method, which requires neither its CAD model nor hours of object-specific training. Recent NeRF-based methods provide a promising solution by directly optimizing the pose from pixel loss between rendered and target images. However, during inference, they require long converging time, and suffer from local minima, making them impractical for real-time robot applications. We aim at solving this problem by marrying image matching with NeRF. With 2D matches and depth rendered by NeRF, we directly solve the pose in one step by building 2D-3D correspondences between target and initial view, thus allowing for real-time prediction. Moreover, to improve the accuracy of 2D-3D correspondences, we propose a 3D consistent point mining strategy, which effectively discards unfaithful points reconstruted by NeRF. Moreover, current NeRF-based methods naively optimizing pixel loss fail at occluded images. Thus, we further propose a 2D matches based sampling strategy to preclude the occluded area. Experimental results on representative datasets prove that our method outperforms state-of-the-art methods, and improves inference efficiency by 90×, achieving real-time prediction at 6 FPS.

Details