Arrow Research search
Back to NeurIPS

NeurIPS 2025

Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

Abstract

Gradient-based optimization methods have shown remarkable empirical success, yet their theoretical generalization properties remain only partially understood. In this paper, we establish a generalization bound for gradient flow that aligns with the classical Rademacher complexity bounds for kernel methods—specifically those based on the RKHS norm and kernel trace—through a data-dependent kernel called the loss path kernel (LPK). Unlike static kernels such as NTK, the LPK captures the entire training trajectory, adapting to both data and optimization dynamics, leading to tighter and more informative generalization guarantees. Moreover, the bound highlights how the norm of the training loss gradients along the optimization trajectory influences the final generalization performance. The key technical ingredients in our proof combine stability analysis of gradient flow with uniform convergence via Rademacher complexity. Our bound recovers existing kernel regression bounds for overparameterized neural networks and shows the feature learning capability of neural networks compared to kernel methods. Numerical experiments on real-world datasets validate that our bounds correlate well with the true generalization gap.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Annual Conference on Neural Information Processing Systems
Archive span
1987-2025
Indexed papers
30776
Paper id
822959865353757409