Author name cluster

Yong Cui

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

AAAI Conference 2026 Conference Paper

KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference

Yuxuan Tian
Zihan Wang
Yebo Peng
Aomufei Yuan
Zhiming Wang
Bairen Yi
Xin Liu
Yong Cui

Efficient inference of large language models (LLMs) is hindered by an ever-growing key-value (KV) cache, making KV cache compression a critical research direction. Traditional methods selectively evict less important KV cache entries, which leads to information loss and hallucinations. Recently, merging-based strategies have been explored to retain more information by merging KV pairs that would be discarded; however, these existing approaches inevitably introduce inconsistencies in attention distributions before and after merging, causing degraded generation quality. To overcome this challenge, we propose KeepKV, a novel adaptive KV cache merging method designed to preserve performance under strict memory constraints, achieving single-step lossless compression and providing error bounds for multi-step compression. KeepKV introduces the Electoral Votes mechanism that records merging history and adaptively adjusts attention scores. Moreover, it further leverages a novel Zero Inference-Perturbation Merging method, compensating for attention loss resulting from cache merging. Extensive experiments on various benchmarks and LLM architectures demonstrate that KeepKV substantially reduces memory usage while successfully retaining essential context information, achieving over 2 times inference throughput improvement and maintaining superior generation quality even with only 10% KV cache budgets.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Fast Inference for Augmented Large Language Models

Rana Shahout
Cong Liang
Shiji Xin
Qianru Lao
Yong Cui
Minlan Yu
Michael Mitzenmacher

Augmented Large Language Models (LLMs) enhance standalone LLMs by integrating external data sources through API calls. In interactive applications, efficient scheduling is crucial for maintaining low request completion times, directly impacting user engagement. However, these augmentations introduce new scheduling challenges: the size of augmented requests (in tokens) no longer correlates proportionally with execution time, making traditional size-based scheduling algorithms like Shortest Job First less effective. Additionally, requests may require different handling during API calls, which must be incorporated into scheduling. This paper presents MARS, a novel inference framework that optimizes augmented LLM latency by explicitly incorporating system- and application-level considerations into scheduling. MARS introduces a predictive, memory-aware scheduling approach that integrates API handling and request prioritization to minimize completion time. We implement MARS on top of vLLM and evaluate its performance against baseline LLM inference systems, demonstrating improvements in end-to-end latency by 27%-85% and reductions in TTFT by 4%-96% compared to the existing augmented-LLM system, with even greater gains over vLLM. Our implementation is available online.

PDF Details

TCS Journal 2020 Journal Article

A general method for decomposing self-intersecting polygon to normal based on self-intersection points

Yong Cui
Qian Liu
Guo Chen
Hujun Zhang

Checking whether polygons are self-intersecting or not is an important step for GIS projects before they are published to the web. Automatically converting self-intersection polygons into normal ones is practically useful, especially there are numerous polygons need to be processed. Based on the relationships of self-intersection points, this paper presents an algorithm to convert a complex self-intersection polygon to a normal one which has no self-intersection part. Furthermore, using the relationships of the repeat points (original self-intersection points) of the decomposed polygon, the result of the only simple polygon can be split into independent sub-polygons bounded by those points. The algorithm is easy to understand and with high efficiency because we consider only the self-intersection point relationships of the polygon, and we do not pay attention to the edges and their directions. A point structure in which the relationships of the self-intersection points are defined is used in the algorithm.

Details DOI

YNIMG Journal 2012 Journal Article

Neural substrates of smoking cue reactivity: A meta-analysis of fMRI studies

Jeffrey M. Engelmann
Francesco Versace
Jason D. Robinson
Jennifer A. Minnix
Cho Y. Lam
Yong Cui
Victoria L. Brown
Paul M. Cinciripini

Reactivity to smoking-related cues may be an important factor that precipitates relapse in smokers who are trying to quit. The neurobiology of smoking cue reactivity has been investigated in several fMRI studies. We combined the results of these studies using activation likelihood estimation, a meta-analytic technique for fMRI data. Results of the meta-analysis indicated that smoking cues reliably evoke larger fMRI responses than neutral cues in the extended visual system, precuneus, posterior cingulate gyrus, anterior cingulate gyrus, dorsal and medial prefrontal cortex, insula, and dorsal striatum. Subtraction meta-analyses revealed that parts of the extended visual system and dorsal prefrontal cortex are more reliably responsive to smoking cues in deprived smokers than in non-deprived smokers, and that short-duration cues presented in event-related designs produce larger responses in the extended visual system than long-duration cues presented in blocked designs. The areas that were found to be responsive to smoking cues agree with theories of the neurobiology of cue reactivity, with two exceptions. First, there was a reliable cue reactivity effect in the precuneus, which is not typically considered a brain region important to addiction. Second, we found no significant effect in the nucleus accumbens, an area that plays a critical role in addiction, but this effect may have been due to technical difficulties associated with measuring fMRI data in that region. The results of this meta-analysis suggest that the extended visual system should receive more attention in future studies of smoking cue reactivity.

Details DOI

YNIMG Journal 2004 Journal Article

Engagement of the prefrontal cortex in representational momentum: an fMRI study

Hengyi Rao
Shihui Han
Yi Jiang
Yanping Xue
Hua Gu
Yong Cui
Dingguo Gao

Behavioral studies have identified a robust phenomenon that an observer's memory of the final position of a moving target is shifted a little further in its motion direction, which is usually called representational momentum (RM). However, the neural substrates underlying RM are poorly understood. The current study measured hemodynamic responses in association with RM using functional magnetic resonance imaging (fMRI). Two experiments using block and event-related designs, respectively, were conducted in which subjects compared the orientation of a probe rectangle with the remembered orientation of the final inducing figures in a set of rotating rectangles. Both experiments showed that, relative to the control task in which behavioral data did not show RM effects, RM task induced stronger activation in the prefrontal cortex. However, no activation was found in MT/MST complex in association with RM. The fMRI results suggest that RM may not simply reflect implicit motion perception and high level cognitive mechanisms underpinned by the prefrontal cortex may be involved in the RM effect.

Details DOI

IROS Conference 2003 Conference Paper

A unified adaptive force control of underwater vehicle-manipulator systems (UVMS)

Yong Cui
Junku Yuh

A unified adaptive force control approach for underwater vehicle manipulator systems (UVMS) is proposed in this paper. First, a direct adaptive impedance control scheme is introduced. This controller is further incorporated into the unified force control strategy, which combines adaptive impedance control with hybrid position/force control by means of fuzzy switching to perform autonomous underwater manipulation. This approach combines the advantages of impedance control with hybrid control without knowing the accurate dynamic model of the system and has the potential to be effective in underwater environment. Extensive computer simulations are performed to verify the efficacy of the proposed control scheme based on a UVMS model with 6 DOF autonomous underwater vehicle and a 3 DOF robot arm that is mounted on the vehicle.

Details

ICRA Conference 2000 Conference Paper

A Unified Force Control Approach to Autonomous Underwater Manipulation

Yong Cui
Nilanjan Sarkar

A unified force control scheme for an autonomous underwater robotic system is proposed. This robotic system is composed of a six degree-of-freedom autonomous underwater vehicle (AUV) and a robotic arm that is mounted on the AUV. First, a dynamic model for the whole underwater manipulator system considering the hydrodynamic effects is derived. This model is then used to implement the proposed unified force control approach, which combines impedance control with hybrid position/force control by means of fuzzy switching to perform autonomous underwater manipulation. This approach combines the advantages of impedance control with hybrid control and has the potential to be effective in underwater environment. Extensive computer simulations are performed to verify the efficacy of the proposed control scheme, and the results are presented.

Details

IROS Conference 1999 Conference Paper

Impedance control of underwater vehicle-manipulator systems (UVMS)

Yong Cui
Tarun Kanti Podder
Nilanjan Sarkar

An impedance control scheme for an autonomous underwater robotic system is proposed. This robotic system is composed of a six degree-of-freedom autonomous underwater vehicle (AUV) and a robotic arm that is mounted on the AUV. First, a dynamic model for the whole underwater vehicle-manipulator system (UVMS) considering various hydrodynamic effects is derived using a quasi-Langrange method. This model is later used to implement the proposed impedance controller. The impedance controller is designed considering the whole UVMS as one dynamic system. Extensive computer simulations are performed to verify the efficacy of the proposed control scheme and the results are presented in the paper.

Details