Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel

Yilan Chen; Zhichao Wang; Wei Huang; Andi Han; Taiji Suzuki; Arya Mazumdar

Back to NeurIPS

NeurIPS 2025

Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Gradient-based optimization methods have shown remarkable empirical success, yet their theoretical generalization properties remain only partially understood. In this paper, we establish a generalization bound for gradient flow that aligns with the classical Rademacher complexity bounds for kernel methods—specifically those based on the RKHS norm and kernel trace—through a data-dependent kernel called the loss path kernel (LPK). Unlike static kernels such as NTK, the LPK captures the entire training trajectory, adapting to both data and optimization dynamics, leading to tighter and more informative generalization guarantees. Moreover, the bound highlights how the norm of the training loss gradients along the optimization trajectory influences the final generalization performance. The key technical ingredients in our proof combine stability analysis of gradient flow with uniform convergence via Rademacher complexity. Our bound recovers existing kernel regression bounds for overparameterized neural networks and shows the feature learning capability of neural networks compared to kernel methods. Numerical experiments on real-world datasets validate that our bounds correlate well with the true generalization gap.

Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel

Abstract

Authors

Keywords

Context