AAAI Conference 2026 Conference Paper
PEFT-BoA: Parameter-Efficient Fine-Tuning with Bag-of-Adapters for Multi-Modal Object Re-identification
- Hongchao Li
- Guangxing Liu
- Xixi Wang
- Baihe Liang
- YongLong Luo
Multi-modal object Re-identification (ReID) aims to retrieve individuals by leveraging complementary information from different modalities. Recent CLIP-based approaches show promising results, but they usually employ prompt-based or hybrid prompt-adapter tuning and still face the problems of heterogeneous domain gap, fine-grained identity discrimination and noise instance interference. To address these problems, we introduce a novel Parameter-Efficient Fine-Tuning framework with Bag-of-Adapters (PEFT-BoA) based on the pre-trained CLIP's vision encoder for multi-modal object ReID. Specifically, we first propose a Domain-specific Patch Adapter (DPA) designed to bridge the visual feature gap between pre-trained and fine-tuned models at the local patch level. Meanwhile, we propose a Task-specific Class Adapter (TCA) enhance the fine-grained identity discrimination ability by optimizing global class token. Finally, we propose an Instance-specific Fusion Adapter (IFA) dynamically selects and combines only the most useful features across different modalities for each instance. Our PEFT-BoA achieves the better performance on multi-modal object re-identification benchmarks, while maintaining fewer trainable parameters (6.62M) and a higher training throughput (246.2fps).