Variational Learning is Effective for Large Deep Networks

Yuesong Shen; Nico Daheim; Bai Cong; Peter Nickl; Gian Maria Marconi; Clement Bazan; Rio Yokota; Iryna Gurevych; Daniel Cremers; Mohammad Emtiyaz Khan; Thomas Möllenhoff

Back to ICML

ICML 2024

Variational Learning is Effective for Large Deep Networks

Conference Paper Accept (Spotlight) Artificial Intelligence · Machine Learning

Details

Abstract

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https: //github. com/team-approx-bayes/ivon.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: International Conference on Machine Learning
Archive span: 1993-2025
Indexed papers: 16471
Paper id: 351983285847089375