ProxyTTT: Proxy-driven Test-Time Training for Multi-modal Re-identification

Aihua Zheng; Zhaojun Liu; Xixi Wan; Chenglong Li; Jin Tang; Yan Yan

doi:10.1609/aaai.v40i16.38337

Back to AAAI

AAAI 2026

ProxyTTT: Proxy-driven Test-Time Training for Multi-modal Re-identification

Conference Paper AAAI Technical Track on Computer Vision XIII Artificial Intelligence

PDF Details DOI

Abstract

Multi-modal object re-identification (ReID) aims to retrieve specific targets by leveraging complementary cues from different sensing modalities. Despite recent progress, two key challenges remain: (1) the limited ability to jointly address both modality and viewpoint discrepancies, and (2) the difficulty of effectively leveraging reliable target-domain data to improve generalization. To address these challenges, we propose Proxy-driven Test-Time Training (ProxyTTT), a unified framework that enhances both multi-modal identity representation learning and model generalization. During training, we propose a Multi-Proxy Learning (MPL) mechanism to address the representation bias across different views and modalities. MPL disentangles fine-grained modality-specific and modality-common identity proxies as semantic anchors to align identity features across diverse perspectives and sensing modalities. This alignment strategy enables the model to learn robust and discriminative global identity representations under heterogeneous modality conditions. At test time, to reliably exploit target domain data, we propose Proxy-guided Entropy-based Selective Adaptation (PESA) for test-time training. Specifically, PESA leverages the semantic structure encoded by identity proxies to estimate prediction uncertainty via entropy, and selectively adapts the model using only high-confidence samples. This selective adaptation effectively mitigates the domain shift between training and deployment environments, improving the model’s generalization in real-world scenarios. Extensive experiments on four public multi-modal ReID benchmarks (RGBNT201, RGBNT100, MSVR310, and WMVeID863) demonstrate the effectiveness of ProxyTTT.

ProxyTTT: Proxy-driven Test-Time Training for Multi-modal Re-identification

Abstract

Authors

Keywords

Context