Understanding Surprising Generalization Phenomena in Deep Learning

Wei Hu

doi:10.1609/aaai.v38i20.30285

Back to AAAI

AAAI 2024

Understanding Surprising Generalization Phenomena in Deep Learning

Conference Paper New Faculty Highlights Artificial Intelligence

PDF Details DOI

Abstract

Deep learning has exhibited a number of surprising generalization phenomena that are not captured by classical statistical learning theory. This talk will survey some of my work on the theoretical characterizations of several such intriguing phenomena: (1) Implicit regularization: A major mystery in deep learning is that deep neural networks can often generalize well despite their excessive expressive capacity. Towards explaining this mystery, it has been suggested that commonly used gradient-based optimization algorithms enforce certain implicit regularization which effectively constrains the model capacity. (2) Benign overfitting: In certain scenarios, a model can perfectly fit noisily labeled training data, but still archives near-optimal test error at the same time, which is very different from the classical notion of overfitting. (3) Grokking: In certain scenarios, a model initially achieves perfect training accuracy but no generalization (i.e. no better than a random predictor), and upon further training, transitions to almost perfect generalization. Theoretically establishing these properties often involves making appropriate high-dimensional assumptions on the problem as well as a careful analysis of the training dynamics.

Authors

Wei Hu State Key Laboratory for Novel Software Technology, Nanjing University, China National Institute of Healthcare Data Science, Nanjing University, China

Keywords

Deep Learning
Generalization
Over-parameterization
Theory

Context

Venue: AAAI Conference on Artificial Intelligence
Archive span: 1980-2026
Indexed papers: 28718
Paper id: 360064738733311076