Arrow Research search
Back to IROS

IROS 2025

ContextCache: Task-Aware Lifecycle Management for Memory-Efficient LLM Agent Deployment

Conference Paper Accepted Paper Artificial Intelligence ยท Robotics

Abstract

LLM-based agents have demonstrated remarkable capabilities in multi-step reasoning and task execution across domains such as robotics and autonomous systems. However, deploying these agents on resource-constrained platforms presents a fundamental challenge: minimizing latency while optimizing memory usage. Existing caching techniques (KVCache, PrefixCache, PromptCache) improve inference speed by reusing cached context but overlook LLM dependency relationships in agent workflows, leading to excessive memory usage or redundant recomputation across LLM calls. To address this, we propose ContextCache, a task-aware lifecycle management framework that optimizes context fragment caching for multi-step LLM agents. ContextCache predicts the lifespan of each context fragment and dynamically allocates and releases GPU memory accordingly. We evaluate our approach on a newly constructed dataset, covering logistics coordination, assembly tasks, and health management. Experimental results demonstrate a 15% reduction in memory usage compared to state-of-the-art caching strategies, with no loss in inference efficiency, making our approach well-suited for real-world deployment in resource-constrained environments.

Authors

Keywords

  • Autonomous systems
  • Robot kinematics
  • Memory management
  • Graphics processing units
  • Dynamic scheduling
  • Cognition
  • Resource management
  • Assembly
  • Intelligent robots
  • Logistics
  • Lifecycle Management
  • Context-dependent
  • Autonomic System
  • Health Management
  • Task Execution
  • Memory Usage
  • GPU Memory
  • Inference Speed
  • Dynamic Allocation
  • Efficient Inference
  • Reduction In Usage
  • Resource-constrained Environments
  • Caching Scheme
  • Reduce Memory Usage
  • Resource Consumption
  • Directed Acyclic Graph
  • Memory Consumption
  • Task Context
  • Edge Devices
  • Dynamic Management
  • Task Dependency
  • Task Planning
  • Attentional State
  • SOTA Methods
  • Relevant Fragments
  • Plan Generation

Context

Venue
IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span
1988-2025
Indexed papers
26578
Paper id
396584553104323651