Disentangled Hypergraph-Guided Mamba Scanning for Fine-Grained Visual Recognition

Zhongwei Xiong; Hao Wang; Xiaoyan Yu; Lingling Li; Xuezhuan Zhao; Taisong Jin

doi:10.1609/aaai.v40i13.38098

Back to AAAI

AAAI 2026

Disentangled Hypergraph-Guided Mamba Scanning for Fine-Grained Visual Recognition

Conference Paper AAAI Technical Track on Computer Vision X Artificial Intelligence

PDF Details DOI

Abstract

Fine-grained Visual Recognition (FGVR) aims to distinguish between categories with subtle inter-class differences and large intra-class variations. While Vision Transformers with attention mechanisms have been widely adopted for FGVR, they usually suffer from high computational complexity and entangled global representations. Recent advancements in state-space models, exemplified by Mamba, have showcased substantial potential in vision-related tasks due to their linear scalability and rich sequence modeling capacity. To this end, we propose DHMamba, a novel Mamba based FGVR method. The proposed method leverages hypergraph to guide selective scanning and strengthen Mamba’s capability in modeling fine-grained semantics. Furthermore, a Disentangled Local Scanning (DLS) module is introduced to utilize hyperedges to allocate distinct informative patches into independent channels for mitigating the representational entanglement. Extensive experiments conducted on multiple FGVR benchmarks demonstrate that the proposed DHMamba outperforms the state-of-the-art methods, validating the efficacy of combining state-space modeling with hypergraph-based feature structuring.

Disentangled Hypergraph-Guided Mamba Scanning for Fine-Grained Visual Recognition

Abstract

Authors

Keywords

Context