Balancing Multimodal Training Through Game-Theoretic Regularization

Konstantinos Kontras; Thomas Strypsteen; Christos Chatzichristos; Paul Liang; Matthew Blaschko; Maarten De Vos

Back to NeurIPS

NeurIPS 2025

Balancing Multimodal Training Through Game-Theoretic Regularization

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Multimodal learning holds the promise for richer information extraction by capturing dependencies across data sources. Yet, current training methods often underperform due to modality competition, a phenomenon where modalities contend for training resources, leaving some underoptimized. This raises a pivotal question: how can we address training imbalances, ensure adequate optimization across all modalities, and achieve consistent performance improvements as we transition from unimodal to multimodal data? This paper proposes the Multimodal Competition Regularizer (MCR), inspired by a mutual information (MI) decomposition designed to prevent the adverse effects of competition in multimodal training. Our key contributions are: 1) A game-theoretic framework that adaptively balances modality contributions by encouraging each to maximize its informative role in the final prediction. 2) Refining lower and upper bounds for each MI term to enhance the extraction of both task-relevant unique and shared information across modalities. 3) Proposing latent space permutations for conditional MI estimation, significantly improving computational efficiency. MCR outperforms all previously suggested training strategies and simple baselines, demonstrating that training modalities jointly lead to important performance gains on synthetic and large real-world datasets. We release our code and models at https: //github. com/kkontras/MCR.

Balancing Multimodal Training Through Game-Theoretic Regularization

Abstract

Authors

Keywords

Context