Fast Training of Large Kernel Models with Delayed Projections

Amirhesam Abedsoltan; Siyuan Ma; Parthe Pandit; Misha Belkin

Back to NeurIPS

NeurIPS 2025

Fast Training of Large Kernel Models with Delayed Projections

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes—a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible. We validate our algorithm, \EP4, across multiple datasets, demonstrating drastic training speedups without compromising the performance. Our implementation is publicly available at: https: //github. com/EigenPro/EigenPro.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: Annual Conference on Neural Information Processing Systems
Archive span: 1987-2025
Indexed papers: 30776
Paper id: 581546022906724345