IROS 2025
High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects
Abstract
High-precision tiny object alignment remains a common and critical challenge for humanoid robots in real world. To address this problem, this paper proposes a vision-based framework for precisely estimating and controlling the relative position between a handheld tool and a target object for humanoid robots, e. g. , a screwdriver tip and a screw head slot. By fusing images from the head and torso cameras on a robot with its head joint angles, the proposed Transformer-based visual servoing method can correct the handheld tool’s positional errors effectively, especially at a close distance. Experiments on M4-M8 screws demonstrate an average convergence error of 0. 8-1. 3 mm and a success rate of 93%-100%. Through comparative analysis, the results validate that this capability of high-precision tiny object alignment is enabled by the Distance Estimation Transformer architecture and the Multi-Perception-Head mechanism proposed in this paper.
Authors
Keywords
Context
- Venue
- IEEE/RSJ International Conference on Intelligent Robots and Systems
- Archive span
- 1988-2025
- Indexed papers
- 26578
- Paper id
- 530395974226768388