Arrow Research search
Back to ICML

ICML 2024

In-Context Language Learning: Architectures and Algorithms

Conference Paper Accept (Poster) Artificial Intelligence · Machine Learning

Abstract

Some neural language models (LMs) exhibit a remarkable capacity for in-context learning (ICL): they can fit predictors to datasets provided as input. While the mechanisms underlying ICL are well-studied in the context of synthetic problems like in-context linear regression, there is still some divergence between these model problems and the “real” ICL exhibited by LMs trained on large text corpora. In this paper, we study ICL through the lens of a new family of model problems we term in context language learning (ICLL). In ICLL, LMs are presented with a set of strings from a formal language, and must generate additional strings from the same language. We focus on in- context learning of regular languages generated by random finite automata. We evaluate a diverse set of neural sequence models on regular ICLL tasks. We first show that Transformers significantly outperform neural sequence models with recurrent or convolutional representations on ICLL tasks. Next, we provide evidence that they do so by computing in-context n-gram statistics using specialized attention heads. Finally, we show that hard-wiring these heads into neural models improves performance not just on synthetic ICLL, but natural language modeling, reducing the perplexity of 340M-parameter Transformers by up to 1. 14 points (6. 7%) on the SlimPajama dataset. Our results highlight the usefulness of in-context formal language learning as a tool for understanding ICL in models of natural text.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
International Conference on Machine Learning
Archive span
1993-2025
Indexed papers
16471
Paper id
148026111917450462