Arrow Research search
Back to IJCAI

IJCAI 2025

Interactive Multimodal Learning via Flat Gradient Modification

Conference Paper AI Ethics, Trust, Fairness Artificial Intelligence

Abstract

Due to the notorious modality imbalance phenomenon, multimodal learning (MML) struggles to achieve satisfactory performance. Recently, multimodal learning with alternating unimodal adaptation (MLA) has been proven effective in mitigating the interference between modalities by capturing interaction through orthogonal projection, thus relieving modality imbalance phenomenon to some extent. However, the projection strategy orthogonal to the original space can lead to poor plasticity as the alternating learning proceeds, thus affecting model performance. To address this issue, in this paper, we propose a novel multimodal learning method called interactiveMML via flat gradient modification (IGM) by employing a flat gradient modification strategy to enhance interactive MML. Specifically, we first employ a flat projection-based gradient modification strategy that is independent to the original space, aiming to avoid the poor plasticity issue. Then we introduce the sharpness-aware minimization (SAM)-based optimization strategy to fully exploit the flatness of the learning objective and further enhance interaction during learning. To this end, the plasticity problem can be avoided and the overall performance is improved. Extensive experiments on widely used datasets demonstrate that IGM outperforms various state-of-the-art (SOTA) baselines, achieving superior performance. The source code is available at https: //anonymous. 4open. science/r/method-CC45.

Authors

Keywords

  • Computer Vision: CV: Machine learning for vision
  • Machine Learning: ML: Applications
  • Machine Learning: ML: Multi-modal learning
  • Machine Learning: ML: Representation learning

Context

Venue
International Joint Conference on Artificial Intelligence
Archive span
1969-2025
Indexed papers
14525
Paper id
1027180561767158538