i-Code: An Integrative and Composable Multimodal Learning Framework

Ziyi Yang; Yuwei Fang; Chenguang Zhu; Reid Pryzant; DongDong Chen; Yu Shi; Yichong Xu; Yao Qian; Mei Gao; Yi-Ling Chen; Liyang Lu; Yujia Xie; Robert Gmyr; Noel Codella; Naoyuki Kanda; Bin Xiao; Lu Yuan; Takuya Yoshioka; Michael Zeng; Xuedong Huang

doi:10.1609/aaai.v37i9.26290

Back to AAAI

AAAI 2023

i-Code: An Integrative and Composable Multimodal Learning Framework

Conference Paper AAAI Technical Track on Machine Learning IV Artificial Intelligence

PDF Details DOI

Abstract

Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel merge- and co-attention mechanisms to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five multimodal understanding tasks and single-modality benchmarks, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.

Keywords

ML: Multimodal Learning
ML: Representation Learning
ML: Unsupervised & Self-Supervised Learning

Context

Venue: AAAI Conference on Artificial Intelligence
Archive span: 1980-2026
Indexed papers: 28718
Paper id: 615819036329187374

Abstract

Authors

Keywords

Context