IROS Conference 2025 Conference Paper
Generalizable and Actionable Part Detection and Manipulation with SAM-rectified Segmentation and Iterative Pose Refinement
- Sucheng Qian
- Li Zhang 0104
- Yanyan Wei
- Liu Liu 0012
- Cewu Lu
The ability to perform cross-category object perception and manipulation is highly desirable in building intelligent robots. One promising approach is to define the concept of Generalizable and Actionable Parts (GAParts), such as buttons and handles, on both seen and unseen object categories. However, the accurate cross-category perception of GAParts is still challenging due to the large inter-category object shape variations. To address this issue, we introduce SAMIR, a novel framework using SAM-rectified segmentation and Iterative pose Refinement for GAPart detection and manipulation. Firstly, we introduce a Segment Anything (SAM) segmentation prior to rectify the unconfident, fragmented GAPart instance proposals. Secondly, in addition to the zero-shot generalization of the SAM foundation model, we further finetune it with a lightweight adaptor model on our task dataset. Finally, we propose an iterative pose refinement procedure that improves the accuracy of GAPart pose estimation. Our perception experiments on GAPartNet dataset show that SAMIR consistently outperforms the baseline method on instance segmentation and pose estimation tasks. Our manipulation experiments in Sapien simulator illustrate that SAMIR leads to an improved manipulation success rate. We also deploy our method to a real robot for real-world manipulation. Our code and video are available at sites.google.com/view/samir-gapart.