Qimeng Yang Papers

EAAI Journal 2026 Journal Article

Lightweight dual-stream multi-scale feature fusion medical image multi-disease adaptation classification network based on guided enhancement

Wenlong Shi
Long Yu
Shengwei Tian
Qimeng Yang
Dezhi Zhang
Shirong Yu
Weidong Wu

In the mobile healthcare scenario, the efficient deployment of lightweight image classification models on edge devices can significantly enhance the accessibility and real-time performance of medical services, providing reliable technical support for scenarios such as scarce medical resources in remote areas, real-time diagnosis on mobile terminals, and remote image analysis. Aiming at the problems such as insufficient cross-domain adaptability and inadequate feature extraction of lightweight models in the task of medical image classification, this paper proposes a lightweight dual-stream multi-scale feature fusion medical adaptation classification network based on guided enhancement (DMF-MobileMamba). This network adopts a parallel dual stream architecture, combining the local texture extraction capability of Convolutional Neural Network (CNN) with the global remote dependency modeling advantage of the improved lightweight multi-scale adapter Mamba module, and achieving heterogeneous feature complementarity through decoupling design. The multi-scale attention modulation fusion module (MSA-Fusion) is used to dynamically and weighted fuse local and global features; Innovatively proposed the Cross-level Guided Enhancement Attention Module (CLGE), which utilizes shallow high-resolution details to dynamically correct deep semantic biases and alleviate the representation mismatch problem between model levels. Experiments show that DMF-MobileMamba only requires 4. 039 million(M) parameters and 2. 438 Giga Floating-point Operations Per Second(GFLOPS). On six medical datasets, its classification accuracy is significantly better than that of mainstream advanced lightweight models, and it achieves a real-time inference speed of 134. 64 millisecond(ms) on mobile devices. It provides high-precision and low-cost solutions for resource-constrained scenarios.

Details DOI

AAAI Conference 2026 Conference Paper

Who Should I Trust? Explicit Confidence-Focused Multimodal Intent Recognition

Yi Liu
Qimeng Yang
Lanlan Lu

Multimodal intent recognition is aimed at understanding user intentions by integrating information from multiple modalities. It has attracted increasing attention in recently developed dialog systems. The existing studies have focused mainly on modeling semantic interactions within and across modalities, but they often overlook the reliability of each modality. In real-world scenarios, inputs may be corrupted by noisy audio, blurred or occluded videos, or ambiguous text, making it difficult for the employed model to determine who to trust and how much to trust. To address this challenge, we propose a method called explicit confidence-focused multimodal intent recognition (ECFMIR). The core idea of this approach is to assign each modality and each cross-modal associations feature a dedicated confidence lens (CLens) that explicitly estimates the confidence level in a hypothetical manner. This design helps reduce the degree of uncertainty and mitigate the risk of incorrect predictions when addressing conflicting inputs. Comprehensive experiments conducted on two benchmark multimodal intent recognition datasets demonstrate the effectiveness of our method. A further analysis reveals that ECFMIR achieves significant advantages for high-conflict categories and under low-resource conditions.

PDF Details DOI

Possible papers

Lightweight dual-stream multi-scale feature fusion medical image multi-disease adaptation classification network based on guided enhancement

Who Should I Trust? Explicit Confidence-Focused Multimodal Intent Recognition