Attention to Threat-Relevant Objects: Reasoning Detection in Autonomous Driving via Multimodal Large Language Models

Yulin He; Wei Chen; Xinbiao Gan; Siqi Wang; Haotian Wang; Yusong Tan

doi:10.1609/aaai.v40i6.42470

Back to AAAI

AAAI 2026

Attention to Threat-Relevant Objects: Reasoning Detection in Autonomous Driving via Multimodal Large Language Models

Conference Paper AAAI Technical Track on Computer Vision III Artificial Intelligence

PDF Details DOI

Abstract

Perceiving threats is an innate human instinct. During driving, humans naturally focus their attention on objects that pose real potential risks. Motivated by this observation, we shift the focus from traditional class-based detection to a novel task termed threat-oriented reasoning detection in autonomous driving. This task aims to localize threat objects and reason about their threat levels from a driver-centric perspective. To support this task, we build a benchmark comprising diverse corner-case scenarios, annotated by multiple experienced drivers to reflect human-aligned threat cognition. Given the reasoning demands of this task, we then explore the capabilities of multi-modal large language models (MLLMs) and introduce two methods based on whether the MLLM supports object detection: 1) For MLLMs lacking detection capability, we introduce ThreatCoT, a plug-and-play training-free method that combines chain-of-thought (CoT) with a visual expert toolchain to support step-by-step reasoning. 2) For MLLMs with detection support, we introduce ThreatReasoner, an end-to-end reinforcement learning (RL)-based method built on the GRPO algorithm, which enables per-object reasoning through a fully unsupervised reward strategy. Both quantitative and qualitative experiments show that our methods can effectively unlock the new capabilities of MLLM in threat-oriented reasoning detection.

Attention to Threat-Relevant Objects: Reasoning Detection in Autonomous Driving via Multimodal Large Language Models

Abstract

Authors

Keywords

Context