IROS 2024
Weakly Scene Segmentation Using Efficient Transformer
Abstract
Current methods for large-scale point cloud scene semantic segmentation rely on manually annotated dense point-wise labels, which are costly, labor-intensive, and prone to errors. Consequently, gathering point cloud scenes with billions of labeled points is impractical in real-world scenarios. In this paper, we introduce a novel weak supervision approach to semantically segment large-scale indoor scenes, requiring only 1‰ of the points to be labeled. Specifically, we develop an efficient point neighbor Transformer to capture the geometry of local point cloud patches. To address the quadratic complexity of self-attention computation in Transformers, particularly for large-scale point clouds, we propose approximating the self-attention matrix using low-rank and sparse decomposition. Building on the point neighbor Transformer as foundational blocks, we design a Low-rank Sparse Transformer Network (LST-Net) for weakly supervised large-scale point cloud scene semantic segmentation. Experimental results on two commonly used indoor point cloud scene segmentation benchmarks demonstrate that our model achieves performance comparable to those of both weakly supervised and fully supervised methods. Our code can be found in https://github.com/hhuang-code/LST-Net.
Authors
Keywords
Context
- Venue
- IEEE/RSJ International Conference on Intelligent Robots and Systems
- Archive span
- 1988-2025
- Indexed papers
- 26578
- Paper id
- 1039467395511405547