AAAI Conference 2026 Conference Paper
DeFT-LoRA: Decoupled and Fused Tuning with LoRA Experts for Universal Cross-Domain Retrieval
- Ke Xu
- Xiaozheng Shen
- Shanshan Wang
- Mengzhu Wang
- Xun Yang
Universal Cross-Domain Retrieval (UCDR) aims to retrieve images across unseen domains and categories, a critical capability for real-world applications. While large-scale Vision-Language Models (VLMs) like CLIP offer strong zero-shot category generalization, they struggle with domain shifts. Existing methods often improve domain robustness at the cost of high computational overhead or by compromising the VLM's inherent knowledge. To address this, we propose Decoupled and Fused Tuning with LoRA (DeFT-LoRA), a novel and parameter-efficient framework that integrates Low-Rank Adaptation (LoRA) with a Mixture-of-Experts (MoE) mechanism. This approach resolves the intrinsic conflict between domain-invariant and domain-specific knowledge in a single adapter, enabling our model to construct a domain adapters for each input image. We propose a three-stage training strategy, which first learns a shared Base LoRA for domain-invariant features, then derives Domain-Specific Experts to capture specific styles, and finally fuses them dynamically with a lightweight gating network. Extensive experiments on three UCDR benchmarks demonstrate that DeFT-LoRA achieves comparable or superior performance to state-of-the-art methods while requiring only 1.46 percent of CLIP's image-encoder parameters and reducing computational overhead, thereby establishing an exceptional balance between accuracy and efficiency.