νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints

Thomas Pethick; Parameswaran Raman; Lenon Minorics; Mingyi Hong; Shoham Sabach; Volkan Cevher

Back to TMLR

TMLR 2025

νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints

Journal Article Articles Artificial Intelligence · Machine Learning

PDF Details

Abstract

Sharpness-aware minimization (SAM) has been shown to improve the generalization of neural networks. However, the method comes at the expense of storing a perturbation of the model parameters, which can be restrictive when memory bound. We design a variant of SAM, called $\nu$SAM, which obtains a low-rank perturbation by modifying the perturbation constraint. The update almost entirely removes the memory footprint of the perturbation without increasing the computational complexity, thus achieving close to a $1/3$ memory saving regarding the parameters when using SGD as the base optimizer. We demonstrate comparable performance of $\nu$SAM with SAM on vision transformers both when training models from scratch and for fine-tuning. Interestingly, $\nu$SAM seems to significantly improve performance for MLP-Mixer architectures across both settings. The results are corroborated theoretically, where we show that SAM with an \emph{arbitrary} norm choice (which includes $\nu$SAM) can converge even with fixed perturbation radius.

νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints

Abstract

Authors

Keywords

Context