Multimodal Agentic Model Predictive Control

Saptarashmi Bandyopadhyay; John (Jack) Cole; Tom Goldstein; David Jacobs

Back to AAMAS

AAMAS 2025

Multimodal Agentic Model Predictive Control

Conference Paper Blue Sky Ideas Autonomous Agents and Multiagent Systems

PDF

Abstract

Control problems for autonomous AI agents, especially safetycritical applications such as autonomous vehicle control, require robust decision-making frameworks to ensure safe navigation in such complex and dynamic environments. This necessitates approaches such as Agentic Model Predictive Control (MPC), which can anticipate future problems and plan for them accordingly. Recently, Multimodal Vision Language Models (VLMs), have emerged as a way to give a semantic meaning to a scene that draws on extremely large amounts of information and contextual understanding of the world. These models vary in a wide range of sizes, trading off speed with performance as they scale further and further. This paper introduces a novel framework that integrates MPC with Multimodal VLMs in order to enhance the ability of autonomous vehicles to navigate and respond to real-world scenarios. Leveraging the opensource Waymax library released by Waymo, along with Waymo Open Motion, Berkeley DeepDrive and NuScenes Datasets, our method uses Multimodal VLMs to detect and draw bounding boxes around important parts of the scene, such as pedestrians or other vehicles. These models are helpful for querying specific attributes of identified objects, such as telling if a vehicle is accelerating or decelerating, or by recognizing if a newly detected obstacle is on a collision course with the vehicle. By incorporating these and other semantic insights into an MPC framework, an autonomous vehicle can make more informed and more context aware decisions to mitigate the risk of a collision and safely navigate its surroundings. We evaluate our approach in diverse simulated environments using VLMs of different scales, demonstrating improvements in safety metrics compared to traditional MPC methods. The integration of VLMs with MPC represents a significant advancement in autonomous decision-making, and especially in dynamic and uncertain situations. Our approach paves the way for future research in using Multimodal VLMs for more intelligent and adaptable autonomous agents. This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

Authors

Keywords

Model-Predictive Control
AI Agents
Multimodal (Vision-Language)
Autonomous Cars
Safe Navigation

Context

Venue: International Conference on Autonomous Agents and Multiagent Systems
Archive span: 2002-2025
Indexed papers: 7403
Paper id: 465846809177197100