Bidirectional Contrastive Split Learning for Visual Question Answering

Yuwei Sun; Hideya Ochiai

doi:10.1609/aaai.v38i19.30158

Back to AAAI

AAAI 2024

Bidirectional Contrastive Split Learning for Visual Question Answering

Conference Paper AAAI Technical Track on Safe, Robust and Responsible AI Track Artificial Intelligence

PDF Details DOI

Abstract

Visual Question Answering (VQA) based on multi-modal data facilitates real-life applications such as home robots and medical diagnoses. One significant challenge is to devise a robust decentralized learning framework for various client models where centralized data collection is refrained due to confidentiality concerns. This work aims to tackle privacy-preserving VQA by decoupling a multi-modal model into representation modules and a contrastive module, leveraging inter-module gradients sharing and inter-client weight sharing. To this end, we propose Bidirectional Contrastive Split Learning (BiCSL) to train a global multi-modal model on the entire data distribution of decentralized clients. We employ the contrastive loss that enables a more efficient self-supervised learning of decentralized modules. Comprehensive experiments are conducted on the VQA-v2 dataset based on five SOTA VQA models, demonstrating the effectiveness of the proposed method. Furthermore, we inspect BiCSL's robustness against a dual-key backdoor attack on VQA. Consequently, BiCSL shows significantly enhanced resilience when exposed to the multi-modal adversarial attack compared to the centralized learning method, which provides a promising approach to decentralized multi-modal learning.

Bidirectional Contrastive Split Learning for Visual Question Answering

Abstract

Authors

Keywords

Context