RetroLM: Retrieval-Augmented KVs for Long-Context Processing

Kun Luo; Zheng Liu; Shitao Xiao; Jiabei Chen; Hongjin Qian; Peitian Zhang; Shanshan Jiang; Bin Dong; Jun Zhao; Kang Liu

doi:10.1609/aaai.v40i38.40511

Back to AAAI

AAAI 2026

RetroLM: Retrieval-Augmented KVs for Long-Context Processing

Conference Paper AAAI Technical Track on Natural Language Processing III Artificial Intelligence

PDF Details DOI

Abstract

Long-context processing remains a significant challenge for large language models (LLMs). Retrieval-augmented generation (RAG) has recently emerged as a promising approach, enabling LLMs to selectively access relevant information from extended contexts to improve efficiency. However, existing RAG approaches often lag behind other efficient long-context processing methods primarily due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these limitations, we propose RetroLM, a novel RAG framework designed for effective long-context processing. Unlike traditional approaches, RetroLM introduces KV-level retrieval augmentation, which partitions the LLM's KV cache into contiguous pages and performs encoding and decoding operations based on the retrieved KV pages. Built upon this framework, we further develop a specialized retriever for precise retrieval of critical pages and conduct unsupervised post-training to optimize the model’s ability to leverage retrieved information. Compared with traditional RAG, the new approach enhances robustness to retrieval inaccuracy, facilitates effective utilization of fragmented contexts, and saves the cost from repeated context-encoding operations. We conduct extensive evaluations across several popular benchmarks, including LongBench, InfiniteBench, and RULER. RetroLM consistently outperforms existing long-LLMs and RAG-based methods, especially in tasks requiring deep reasoning or extreme context lengths.

RetroLM: Retrieval-Augmented KVs for Long-Context Processing

Abstract

Authors

Keywords

Context