Retrieval-driven Reasoning for Deliberative Visual Classification

Jianye Xie; Lianyong Qi; Fan Wang; Anqi Wang; Wenjuan Gong; Danxin Wang; Wanchun Dou; Yang Cao; Shichao Pei; Xiaokang Zhou

doi:10.1609/aaai.v40i13.38084

Back to AAAI

AAAI 2026

Retrieval-driven Reasoning for Deliberative Visual Classification

Conference Paper AAAI Technical Track on Computer Vision X Artificial Intelligence

PDF Details DOI

Abstract

Vision-Language Models (VLMs) have demonstrated remarkable capabilities in visual classification tasks. Existing methods for enhancing VLMs on this task often rely heavily on direct category-to-image matching, which limits generalization and results in suboptimal performance. In addition, these methods provide no understanding of why a specific category is chosen. To address these limitations, we introduce a new deliberative visual classification task that decomposes the classification process into multiple deliberative steps and leverages Large Language Models (LLMs) to perform explicit reasoning before the final decision. Specifically, we propose a Retrieval-driven Reasoning model (RdR) with two components, i.e., retrieval database construction and deliberative category prediction. The first component leverages LLMs to extract category-relevant descriptors and constructs a retrieval database for effective image–descriptor matching. The second component facilitates multiple deliberative steps and performs explicit reasoning based on the retrieved descriptors to augment the category prediction. Extensive experiments on multiple datasets demonstrate that RdR consistently outperforms strong baselines, highlighting its robustness and generalization ability.

Retrieval-driven Reasoning for Deliberative Visual Classification

Abstract

Authors

Keywords

Context