TAAS Journal 2026 Journal Article
Adaptive Scheduling of Multimodal Large Language Model in Intelligent Edge Computing
- Xingyu Yuan
- He Li
- Mianxiong Dong
- Kaoru Ota
Multimodal Large Language Models (MLLMs) integrate multimodal encoders with Large Language Models (LLMs) to overcome the limitations of text-only models. Traditional LLMs are deployed on high-performance cloud servers, but MLLMs, which process multimodal data, face high transmission latency and privacy risks when tasks are offloaded to the cloud. Intelligent edge computing is a promising solution for supporting such latency-sensitive and privacy-sensitive tasks. However, the heterogeneity of edge environments makes efficient MLLM inference challenging. In this work, we enhance MLLM inference efficiency in heterogeneous edge environments by decoupling MLLM into LLM and multimodal encoders, deploying the LLM on high-performance devices and the multimodal encoders on lower-capability devices. Additionally, we observe that processing MLLM tasks in edge environments involves numerous configuration parameters that impact inference speed and energy consumption in an unknown and possibly time-varying fashion. To address this challenge, we present an adaptive scheduling algorithm that assigns parameters to tasks or minimizing energy consumption while meeting maximum latency constraints. The results of extensive experimental trials demonstrate that the proposed approach consistently outperforms existing state-of-the-art methods, achieving significant improvements in both latency reduction and energy efficiency.