Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning

Dunqiang Liu; Shujun Huang; Wen Li; Siqi Shen; Cheng Wang

doi:10.1609/aaai.v39i5.32574

Back to AAAI

AAAI 2025

Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning

Conference Paper AAAI Technical Track on Computer Vision IV Artificial Intelligence

PDF Details DOI

Abstract

Language-based localization is a crucial task in robotics and computer vision, enabling robots to understand spatial positions through language. Recent methods rely on contrastive learning to establish correspondences between global features of texts and point clouds. However, the inherent ambiguity of textual descriptions makes it difficult to convey geometric information accurately, forcing alignment of them in the feature space may compromise the expressiveness of the point clouds. Unlike previous methods, this paper proposes using language as a filter to distinguish dissimilar locations. To this end, we propose a robust framework of multi-level negative contrastive learning for language-based localization, fully leveraging the descriptive power of language for spatial localization. Our method learns multiple mismatched factors by minimizing the similarity of different locations at different levels, including global-level, instance-level and relationlevel, respectively. Extensive experiments conducted on the KITTI360Pose benchmark demonstrate that our method outperforms better that the state-of-the-art methods. Specifically, we achieve a 56.3% improvement in Top-1 retrieval recall and a 45.9% improvement in 5m localization recall.

Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning

Abstract

Authors

Keywords

Context