IROS 2025
Learning Generalizable 3D Manipulation With 10 Demonstrations
Abstract
Learning robust and generalizable manipulation skills from few demonstrations remains a key challenge in robotics, with broad applications in industrial automation and service robotics. Although recent imitation learning methods have achieved impressive results, they often require a large amount of demonstration data and struggle to generalize across different spatial variants. In this work, we propose a framework that learns 3D manipulation policies from only 10 demonstrations while achieving robust generalization to unseen spatial configurations through semantic-guided perception and spatial-equivariant policy learning. Our framework consists of two key modules: a Semantic Guided Perception module that extracts task-aware 3D representations from RGB-D inputs using semantic priors and a Spatial Generalized Decision module implementing a diffusion-based policy that preserves spatial equivariance through denoising. Central to our framework is a spatially equivariant training strategy, which adapts 2D data augmentation principles to 3D manipulation by maintaining gripper-object spatial relationships during trajectory augmentation. We validate our framework through extensive experiments on both simulation benchmarks and real-world robotic systems. Our method demonstrates a significant improvement in success rates over state-of-the-art approaches on a series of challenging tasks, particularly under significant object pose variations. This work shows significant potential to advance efficient and generalizable manipulation skill learning in real-world applications.
Authors
Keywords
Context
- Venue
- IEEE/RSJ International Conference on Intelligent Robots and Systems
- Archive span
- 1988-2025
- Indexed papers
- 26578
- Paper id
- 428687899449588792