IROS 2025
Recognizing Skeleton-Based Actions As Points
Abstract
Recent advances in skeleton-based action recognition have been primarily driven by Graph Convolutional Networks (GCNs) and skeleton transformers. While conventional approaches focus on modeling joint co-occurrences through skeletal connections, they overlook the inherent positional information in 3D coordinates. Although Hyper-Graphs partially address the limitation of pairwise aggregation in capturing higher-order kinematic dependencies, challenges remain in their topological definitions. To solve these problems, this paper proposes a skeleton-to-point network (Skeleton2Point) to model joints’ position relationships directly in three-dimensional space without fixed topology limitation, which is the first to regard skeleton recognition as point clouds. However, simply considering the raw 3D coordinates would result in the loss of the anatomical identity of each keypoint and its temporal position in the sequence. To address this limitation, we augment the three-dimensional spatial coordinates with two additional dimensions: the anatomical index of each keypoint and its corresponding frame number with a proposed Information Transform Module (ITM). This transformation extends the representation from a three-dimensional to a five-dimensional feature space. Furthermore, we propose a Cluster-Dispatch-Based Interaction module (CDI) to enhance the discrimination of local-global information. In comparison with existing methods on NTU-RGB+D 60 and NTU-RGB+D 120 datasets, Skeleton2Point has demonstrated state-of-the-art performance on both joint modality and stream fusion. Especially, on the challenging NTU-RGB+D 120 dataset under the X-Sub and X-Set setting, the accuracies reach 90. 63% and 91. 92%.
Authors
Keywords
Context
- Venue
- IEEE/RSJ International Conference on Intelligent Robots and Systems
- Archive span
- 1988-2025
- Indexed papers
- 26578
- Paper id
- 297307119211539023