Author name cluster

Yukai Miao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Transcending Cost-Quality Tradeoff in Agent Serving via Session-Awareness

Yanyu Ren
Li Chen
Dan Li
Xizheng Wang
Zhiyuan Wu
Yukai Miao
Yu Bai

Large Language Model (LLM) agents are capable of task execution across various domains by autonomously interacting with environments and refining LLM responses based on feedback. However, existing model serving systems are not optimized for the unique demands of serving agents. Compared to classic model serving, agent serving has different characteristics: predictable request pattern, increasing quality requirement, and unique prompt formatting. We identify a key problem for agent serving: LLM serving systems lack session-awareness. They neither perform effective KV cache management nor precisely select the cheapest yet competent model in each round. This leads to a cost-quality tradeoff, and we identify an opportunity to surpass it in an agent serving system. To this end, we introduce AgServe for AGile AGent SERVing. AgServe features a session-aware server that boosts KV cache reuse via Estimated-Time-of-Arrival-based eviction and in-place positional embedding calibration, a quality-aware client that performs session-aware model cascading through real-time quality assessment, and a dynamic resource scheduler that maximizes GPU utilization. With AgServe, we allow agents to select and upgrade models during the session lifetime, and to achieve similar quality at much lower costs, effectively transcending the tradeoff. Extensive experiments on real testbeds demonstrate that AgServe (1) achieves comparable response quality to GPT-4o at a 16. 5\% cost. (2) delivers 1. 8$\times$ improvement in quality relative to the tradeoff curve.

PDF Details

AAAI Conference 2021 Conference Paper

Improving the Efficiency and Effectiveness for BERT-based Entity Resolution

Bing Li
Yukai Miao
Yaoshu Wang
Yifang Sun
Wei Wang

BERT has set a new state-of-the-art performance on entity resolution (ER) task, largely owed to fine-tuning pretrained language models and the deep pair-wise interaction. Albeit being remarkably effective, it comes with a steep increase in computational cost, as the deep-interaction requires to exhaustively compute every tuple pair to search for coreferences. For ER task, it is often prohibitively expensive due to the large cardinality to be matched. To tackle this, we introduce a siamese network structure that independently encodes tuples using BERT but delays the pair-wise interaction via an enhanced alignment network. This siamese structure enables a dedicated blocking module to quickly filter out obviously dissimilar tuple pairs, and thus drastically reduces the cardinality of fine-grained matching. Further, the blocking and entity matching are integrated into a multi-task learning framework for facilitating both tasks. Extensive experiments on multiple datasets demonstrate that our model significantly outperforms state-of-the-art models (including BERT) in both efficiency and effectiveness.

PDF Details

AAAI Conference 2020 Conference Paper

A Recurrent Model for Collective Entity Linking with Adaptive Features

Xiaoling Zhou
Yukai Miao
Wei Wang
Jianbin Qin

The vast amount of web data enables us to build knowledge bases with unprecedented quality and coverage. Named Entity Disambiguation (NED) is an important task that automatically resolves ambiguous mentions in free text to correct target entries in the knowledge base. Traditional machine learning based methods for NED were outperformed and made obsolete by the state-of-the-art deep learning based models. However, deep learning models are more complex, requiring large amount of training data and lengthy training and parameter tuning time. In this paper, we revisit traditional machine learning techniques and propose a light-weight, tuneable and time-efﬁcient method without using deep learning or deep learning generated features. We propose novel adaptive features that focus on extracting discriminative features to better model similarities between candidate entities and the mention’s context. We learn a local ranking model based on traditional and the new adaptive features based on the learning-to-rank framework. While arriving at linking decisions individually via the local model, our method also takes into consideration the correlation between decisions by running multiple recurrent global models, which can be deemed as a learned local search method. Our method attains performances comparable to the state-of-the-art deep learning-based methods on NED benchmark datasets while being signiﬁcantly faster to train.

PDF Details