MuSLR: Multimodal Symbolic Logical Reasoning

Jundong Xu; Hao Fei; Yuhui Zhang; Liangming Pan; Qijun Huang; Qian Liu; Preslav Nakov; Min-Yen Kan; William Yang Wang; Mong-Li Lee; Wynne Hsu

Back to NeurIPS

NeurIPS 2025

MuSLR: Multimodal Symbolic Logical Reasoning

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Multimodal symbolic logical reasoning, which aims to deduce new facts from multimodal input via formal logic, is critical in high-stakes applications such as autonomous driving and medical diagnosis, as its rigorous, deterministic reasoning helps prevent serious consequences. To evaluate such capabilities of current state-of-the-art vision language models (VLMs), we introduce the first benchmark MuSLR for multimodal symbolic logical reasoning grounded in formal logical rules. MuSLR comprises 1, 093 instances across 7 domains, including 35 atomic symbolic logic and 976 logical combinations, with reasoning depths ranging from 2 to 9. We evaluate 7 state-of-the-art VLMs on MuSLR and find that they all struggle with multimodal symbolic reasoning, with the best model, GPT-4. 1, achieving only 46. 8%. Thus, we propose LogiCAM, a modular framework that applies formal logical rules to multimodal inputs, boosting GPT-4. 1’s Chain-of-Thought performance by 14. 13%, and delivering even larger gains on complex logics such as first-order logic. We also conduct a comprehensive error analysis, showing that around 70% of failures stem from logical misalignment between modalities, offering key insights to guide future improvements.

MuSLR: Multimodal Symbolic Logical Reasoning

Abstract

Authors

Keywords

Context