ContextCache: Task-Aware Lifecycle Management for Memory-Efficient LLM Agent Deployment

Tao Liu; Ping Guo; Dong Feng; Peng Wang

Back to IROS

IROS 2025

ContextCache: Task-Aware Lifecycle Management for Memory-Efficient LLM Agent Deployment

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Details

Abstract

LLM-based agents have demonstrated remarkable capabilities in multi-step reasoning and task execution across domains such as robotics and autonomous systems. However, deploying these agents on resource-constrained platforms presents a fundamental challenge: minimizing latency while optimizing memory usage. Existing caching techniques (KVCache, PrefixCache, PromptCache) improve inference speed by reusing cached context but overlook LLM dependency relationships in agent workflows, leading to excessive memory usage or redundant recomputation across LLM calls. To address this, we propose ContextCache, a task-aware lifecycle management framework that optimizes context fragment caching for multi-step LLM agents. ContextCache predicts the lifespan of each context fragment and dynamically allocates and releases GPU memory accordingly. We evaluate our approach on a newly constructed dataset, covering logistics coordination, assembly tasks, and health management. Experimental results demonstrate a 15% reduction in memory usage compared to state-of-the-art caching strategies, with no loss in inference efficiency, making our approach well-suited for real-world deployment in resource-constrained environments.

Authors

Keywords

Autonomous systems
Robot kinematics
Memory management
Graphics processing units
Dynamic scheduling
Cognition
Resource management
Assembly
Intelligent robots
Logistics
Lifecycle Management
Context-dependent
Autonomic System
Health Management
Task Execution
Memory Usage
GPU Memory
Inference Speed
Dynamic Allocation
Efficient Inference
Reduction In Usage
Resource-constrained Environments
Caching Scheme
Reduce Memory Usage
Resource Consumption
Directed Acyclic Graph
Memory Consumption
Task Context
Edge Devices
Dynamic Management
Task Dependency
Task Planning
Attentional State
SOTA Methods
Relevant Fragments
Plan Generation

Context

Venue: IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span: 1988-2025
Indexed papers: 26578
Paper id: 396584553104323651