Prior Forgetting and In-Context Overfitting

Sungyoon Lee

Back to NeurIPS

NeurIPS 2025

Prior Forgetting and In-Context Overfitting

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

In-context learning (ICL) is one of the key capabilities contributing to the great success of LLMs. At test time, ICL is known to operate in the two modes: task recognition and task learning. In this paper, we investigate the emergence and dynamics of the two modes of ICL during pretraining. To provide an analytical understanding of the learning dynamics of the ICL abilities, we investigate the in-context random linear regression problem with a simple linear-attention-based transformer, and define and disentangle the strengths of the task recognition and task learning abilities stored in the transformer model’s parameters. We show that, during the pretraining phase, the model first learns the task learning and the task recognition abilities together in the beginning, but it (a) gradually forgets the task recognition ability to recall the priorly learned tasks and (b) relies more on the given context in the later phase, which we call (a) \textit{prior forgetting} and (b) \textit{in-context overfitting}, respectively.

Authors

Sungyoon Lee

Keywords

No keywords are indexed for this paper.

Context

Venue: Annual Conference on Neural Information Processing Systems
Archive span: 1987-2025
Indexed papers: 30776
Paper id: 380413475707810381