Arrow Research search

Author name cluster

Ruikun Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
2 author rows

Possible papers

3

NeurIPS Conference 2025 Conference Paper

Sim-LLM: Optimizing LLM Inference at the Edge through Inter-Task KV Reuse

  • Ruikun Luo
  • Changwei Gu
  • Qiang He
  • Feifei Chen
  • Song Wu
  • Hai Jin
  • Yun Yang

KV cache technology, by storing key-value pairs, helps reduce the computational overhead incurred by large language models (LLMs). It facilitates their deployment on resource-constrained edge computing nodes like edge servers. However, as the complexity and size of tasks increase, KV cache usage leads to substantial GPU memory consumption. Existing research has focused on mitigating KV cache memory usage through sequence length reduction, task-specific compression, and dynamic eviction policies. However, these methods are computationally expensive for resource-constrained edge computing nodes. To tackle this challenge, this paper presents Sim-LLM, a novel inference optimization mechanism that leverages task similarity to reduce KV cache memory consumption for LLMs. By caching KVs from processed tasks and reusing them for subsequent similar tasks during inference, Sim-LLM significantly reduces memory consumption while boosting system throughput and increasing maximum batch size, all with minimal accuracy degradation. Evaluated on both A40 and A100 GPUs, Sim-LLM achieves a system throughput improvement of up to 39. 40\% and a memory reduction of up to 34. 65%, compared to state-of-the-art approaches. Our source code is available at https: //github. com/CGCL-codes/SimLLM.

ICRA Conference 2016 Conference Paper

Considering avoidance and consistency in motion planning for human-robot manipulation in a shared workspace

  • Rafi Hayne
  • Ruikun Luo
  • Dmitry Berenson

This paper presents an approach to formulating the cost function for a motion planner intended for human-robot collaboration on manipulation tasks in a shared workspace. To be effective for human-robot collaboration a robot should plan its motion so that it is both safe and efficient. To achieve this, we propose two factors to consider in the cost function for the robot's motion planner: (1) Avoidance of the workspace previously-occupied by the human, so that the motion is as safe as possible, and (2) Consistency of the robot's motion, so that the motion is as predictable as possible for the human and they can perform their task without focusing undue attention on the robot. Our experiments in simulation and a human-robot workspace sharing study compare a cost function that uses only the first factor and a combined cost that uses both factors vs. a baseline method that is perfectly consistent but does not account for the human's previous motion. We find that using either cost function we outperform the baseline method in terms of task success rate without degrading the task completion time. The best task success rate is achieved with the cost function that includes both the avoidance and consistency terms.

IROS Conference 2015 Conference Paper

A framework for unsupervised online human reaching motion recognition and early prediction

  • Ruikun Luo
  • Dmitry Berenson

This paper focuses on recognition and prediction of human reaching motion in industrial manipulation tasks. Several supervised learning methods have been proposed for this purpose, but we seek a method that can build models on-the-fly and adapt to new people and new motion styles as they emerge. Thus, unlike previous work, we propose an unsupervised online learning approach to the problem, which requires no offline training or manual categorization of trajectories. Our approach consists of a two-layer library of Gaussian Mixture Models that can be used both for recognition and prediction. We do not assume that the number of motion classes is known a priori, and thus the library grows if it cannot explain a new observed trajectory. Given an observed portion of a trajectory, the framework can predict the remainder of the trajectory by first determining what GMM it belongs to, and then using Gaussian Mixture Regression to predict the remainder of the trajectory. We tested our method on motion-capture data recorded during assembly tasks. Our results suggest that the proposed framework outperforms supervised methods in terms of both recognition and prediction. We also show the benefit of using our two-layer framework over simpler approaches.