Stabilizing Cross-Modal Bidirectional Attribution: Few-Shot Adversarial Prompt Tuning for Robust Vision-Language Models

Jun Feng; Shuhong Wu; Hong Sun; Pengfei Zhang; Bocheng Ren; Shunli Zhang

doi:10.1609/aaai.v40i5.37396

Back to AAAI

AAAI 2026

Stabilizing Cross-Modal Bidirectional Attribution: Few-Shot Adversarial Prompt Tuning for Robust Vision-Language Models

Conference Paper AAAI Technical Track on Computer Vision II Artificial Intelligence

PDF Details DOI

Abstract

Large-scale pre-trained vision-language models (VLMs) like CLIP show exceptional performance and zero-shot generalization. However, their reliability may be severely undermined by a critical vulnerability to subtle adversarial perturbations. Our work reveals a critical cross-modal vulnerability: visual-only perturbations induce substantial, synchronous shifts in decision attribution maps across both image and text. This phenomenon signifies a fundamental disruption of the VLM's internal logic, as it alters both the model's perceptual focus and its decision rationale. To counter this vulnerability, we introduce Cross-modal Bidirectional Attribution guided Few-shot Adversarial Prompt Tuning (CBA-FAPT), a novel method that leverages the model's internal decision rationale as a regularizer for robust learning. Our framework's core mechanism is the alignment of a novel bidirectional attribution map. This map is a unique fusion of two components. It combines forward feature attention to capture the model's perceptual focus. It also incorporates backward decision gradients to act as a proxy for the model's decision rationale, quantifying how each feature influences the final outcome. We enforce consistency on this bidirectional map between clean and adversarial examples. This approach corrects the model's internal logic on two fronts and effectively restores its adversarial robustness. Comprehensive experiments on 11 datasets demonstrate that CBA-FAPT outperforms the state-of-the-art, establishing a superior trade-off between robust and natural accuracy.

Stabilizing Cross-Modal Bidirectional Attribution: Few-Shot Adversarial Prompt Tuning for Robust Vision-Language Models

Abstract

Authors

Keywords

Context