Dynamic Operator Optimization for Efficient Multi-Tenant LoRA Model Serving

Changhai Zhou; Yuhua Zhou; Shiyang Zhang; Yibin Wang; Zekai Liu

doi:10.1609/aaai.v39i21.34453

Back to AAAI

AAAI 2025

Dynamic Operator Optimization for Efficient Multi-Tenant LoRA Model Serving

Conference Paper AAAI Technical Track on Machine Learning VII Artificial Intelligence

PDF Details DOI

Abstract

Low-Rank Adaptation (LoRA) has become increasingly popular for efficiently fine-tuning large language models (LLMs) with minimal resources. However, traditional methods that serve multiple LoRA models independently result in redundant computation and low GPU utilization. This paper addresses these inefficiencies by introducing Dynamic Operator Optimization (Dop), an advanced automated optimization technique designed to dynamically optimize the Segmented Gather Matrix-Vector Multiplication (SGMV) operator based on specific scenarios. SGMV's unique design enables batching GPU operations for different LoRA models, significantly improving computational efficiency. The Dop approach leverages a Search Space Constructor to create a hierarchical search space, dividing the program space into high-level structural sketches and low-level implementation details, ensuring diversity and flexibility in operator implementation. Furthermore, an Optimization Engine refines these implementations using evolutionary search, guided by a cost model that estimates program performance. This iterative optimization process ensures that SGMV implementations can dynamically adapt to different scenarios to maintain high performance. We demonstrate that Dop can improve throughput by 1.30-1.46 times in a SOTA multi-tenant LoRA serving.

Dynamic Operator Optimization for Efficient Multi-Tenant LoRA Model Serving

Abstract

Authors

Keywords

Context