AAMAS 2025
Multimodal Agentic Model Predictive Control
Abstract
Control problems for autonomous AI agents, especially safetycritical applications such as autonomous vehicle control, require robust decision-making frameworks to ensure safe navigation in such complex and dynamic environments. This necessitates approaches such as Agentic Model Predictive Control (MPC), which can anticipate future problems and plan for them accordingly. Recently, Multimodal Vision Language Models (VLMs), have emerged as a way to give a semantic meaning to a scene that draws on extremely large amounts of information and contextual understanding of the world. These models vary in a wide range of sizes, trading off speed with performance as they scale further and further. This paper introduces a novel framework that integrates MPC with Multimodal VLMs in order to enhance the ability of autonomous vehicles to navigate and respond to real-world scenarios. Leveraging the opensource Waymax library released by Waymo, along with Waymo Open Motion, Berkeley DeepDrive and NuScenes Datasets, our method uses Multimodal VLMs to detect and draw bounding boxes around important parts of the scene, such as pedestrians or other vehicles. These models are helpful for querying specific attributes of identified objects, such as telling if a vehicle is accelerating or decelerating, or by recognizing if a newly detected obstacle is on a collision course with the vehicle. By incorporating these and other semantic insights into an MPC framework, an autonomous vehicle can make more informed and more context aware decisions to mitigate the risk of a collision and safely navigate its surroundings. We evaluate our approach in diverse simulated environments using VLMs of different scales, demonstrating improvements in safety metrics compared to traditional MPC methods. The integration of VLMs with MPC represents a significant advancement in autonomous decision-making, and especially in dynamic and uncertain situations. Our approach paves the way for future research in using Multimodal VLMs for more intelligent and adaptable autonomous agents. This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).
Authors
Keywords
Context
- Venue
- International Conference on Autonomous Agents and Multiagent Systems
- Archive span
- 2002-2025
- Indexed papers
- 7403
- Paper id
- 465846809177197100