Weakly Scene Segmentation Using Efficient Transformer

Hao Huang 0003; Shuaihang Yuan; Congcong Wen; Yu Hao; Yi Fang 0006

Back to IROS

IROS 2024

Weakly Scene Segmentation Using Efficient Transformer

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Details

Abstract

Current methods for large-scale point cloud scene semantic segmentation rely on manually annotated dense point-wise labels, which are costly, labor-intensive, and prone to errors. Consequently, gathering point cloud scenes with billions of labeled points is impractical in real-world scenarios. In this paper, we introduce a novel weak supervision approach to semantically segment large-scale indoor scenes, requiring only 1‰ of the points to be labeled. Specifically, we develop an efficient point neighbor Transformer to capture the geometry of local point cloud patches. To address the quadratic complexity of self-attention computation in Transformers, particularly for large-scale point clouds, we propose approximating the self-attention matrix using low-rank and sparse decomposition. Building on the point neighbor Transformer as foundational blocks, we design a Low-rank Sparse Transformer Network (LST-Net) for weakly supervised large-scale point cloud scene semantic segmentation. Experimental results on two commonly used indoor point cloud scene segmentation benchmarks demonstrate that our model achieves performance comparable to those of both weakly supervised and fully supervised methods. Our code can be found in https://github.com/hhuang-code/LST-Net.

Authors

Keywords

Point cloud compression
Geometry
Weak supervision
Semantic segmentation
Benchmark testing
Transformers
Sparse matrices
Matrix decomposition
Mobile computing
Intelligent robots
Transformation Efficiency
Scene Segmentation
Point Cloud
Real-world Scenarios
Low-rank Decomposition
Time And Space
Convolutional Neural Network
Sparsity
K-nearest Neighbor
Time Complexity
Feature Learning
3D Space
Sparse Matrix
Feature Points
Space Complexity
Pair Of Points
Null Space
Linear Kernel
Low-rank Approximation
Self-attention Module
Neighboring Points
Human Pose Estimation
Hypersphere
Point Cloud Features
Annotated Training
Neighborhood Size
Point Cloud Representation
Robotic System

Context

Venue: IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span: 1988-2025
Indexed papers: 26578
Paper id: 1039467395511405547