Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise

Haocheng Luo; Mehrtash Harandi; Dinh Phung; Trung Le

Back to NeurIPS

NeurIPS 2025

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Sharpness-aware minimization (SAM) has emerged as a highly effective technique to improve model generalization, but its underlying principles are not fully understood. We investigate m-sharpness, where SAM performance improves monotonically as the micro-batch size for computing perturbations decreases, a phenomenon critical for distributed training yet lacking rigorous explanation. We leverage an extended Stochastic Differential Equation (SDE) framework and analyze stochastic gradient noise (SGN) to characterize the dynamics of SAM variants, including n-SAM and m-SAM. Our analysis reveals that stochastic perturbations induce an implicit variance-based sharpness regularization whose strength increases as m decreases. Motivated by this insight, we propose Reweighted SAM (RW-SAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable. Comprehensive experiments validate our theory and method.

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise

Abstract

Authors

Keywords

Context