Layer-wise Quantization for Quantized Optimistic Dual Averaging

Anh Duc Nguyen; Ilia Markov; Frank Zhengqing Wu; Ali Ramezani-Kebrya; Kimon Antonakopoulos; Dan Alistarh; Volkan Cevher

Back to ICML

ICML 2025

Layer-wise Quantization for Quantized Optimistic Dual Averaging

Conference Paper Accept (poster) Artificial Intelligence · Machine Learning

Details

Abstract

Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc. , due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150$% speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.

Authors

Keywords

Adaptive Compression
Layer-wise Compression
Optimistic Dual Averaging
Distributed Variational Inequality

Context

Venue: International Conference on Machine Learning
Archive span: 1993-2025
Indexed papers: 16471
Paper id: 961298883590541652