Author name cluster

Fei Huang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

57 papers

2 author rows

JBHI Journal 2026 Journal Article

3D-CNN Enhanced Multiscale Progressive Vision Transformer for AD Diagnosis

Fei Huang
Nanguang Chen
Anqi Qiu

Vision Transformer (ViT) applied to structural magnetic resonance images has demonstrated success in the diagnosis of Alzheimer’s disease (AD) and mild cognitive impairment (MCI). However, three key challenges have yet to be well addressed: 1) ViT requires a large labeled dataset to mitigate overfitting while most of the current AD-related sMRI data fall short in the sample sizes. 2) ViT neglects the within-patch feature learning, e. g. , local brain atrophy, which is crucial for AD diagnosis. 3) While ViT can enhance capturing local features by reducing the patch size and increasing the number of patches, the computational complexity of ViT quadratically increases with the number of patches with unbearable overhead. To this end, this paper proposes a 3D-convolutional neural network (CNN) Enhanced Multiscale Progressive ViT (3D-CNN-MPVT). First, a 3D-CNN is pre-trained on sMRI data to extract detailed local image features and alleviate overfitting. Second, an MPVT module is proposed with an inner CNN module to explicitly characterize the within-patch interactions that are conducive to AD diagnosis. Third, a stitch operation is proposed to merge cross-patch features and progressively reduce the number of patches. The inner CNN alongside the stitch operation in the MPTV module enhances local feature characterization while mitigating computational costs. Evaluations using the Alzheimer’s Disease Neuroimaging Initiative dataset with 6610 scans and the Open Access Series of Imaging Studies-3 with 1866 scans demonstrated its superior performance. With minimal preprocessing, our approach achieved an impressive 90% accuracy and 80% in AD classification and MCI conversion prediction, surpassing recent baselines.

Details DOI

AAAI Conference 2026 Conference Paper

Efficient and Effective In-context Demonstration Selection with Coreset

Zihua Wang
Jiarui Wang
Haiyang Xu
Ming Yan
Fei Huang
Xu Yang
Xiu-Shen Wei
Siya Mi

In-context learning (ICL) has emerged as a powerful paradigm for Large Visual Language Models (LVLMs), enabling them to leverage a few examples directly from input contexts. However, the effectiveness of this approach is heavily reliant on the selection of demonstrations, a process that is NP-hard. Traditional strategies, including random, similarity-based sampling and infoscore-based sampling, often lead to inefficiencies or suboptimal performance, struggling to balance both efficiency and effectiveness in demonstration selection. In this paper, we propose a novel demonstration selection framework named Coreset-based Dual Retrieval (CoDR). We show that samples within a diverse subset achieve a higher expected mutual information. To implement this, we introduce a cluster-pruning method to construct a diverse coreset that aligns more effectively with the query while maintaining diversity. Additionally, we develop a dual retrieval mechanism that enhances the selection process by achieving global demonstration selection while preserving efficiency. Experimental results demonstrate that our method significantly improves the ICL performance compared to the existing strategies, providing a robust solution for effective and efficient demonstration selection.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Selective Weak-to-Strong Generalization

Hao Lang
Fei Huang
Yongbin Li

Future superhuman models will surpass the ability of humans and humans will only be able to \textit{weakly} supervise superhuman models. To alleviate the issue of lacking high-quality data for model alignment, some works on weak-to-strong generalization (W2SG) finetune a strong pretrained model with a weak supervisor so that it can generalize beyond weak supervision. However, the invariable use of weak supervision in existing methods exposes issues in robustness, with a proportion of weak labels proving harmful to models. In this paper, we propose a selective W2SG framework to avoid using weak supervision when unnecessary. We train a binary classifier P(IK) to identify questions that a strong model can answer and use its self-generated labels for alignment. We further refine weak labels with a graph smoothing method. Extensive experiments on three benchmarks show that our method consistently outperforms competitive baselines. Further analyses show that P(IK) can generalize across tasks and difficulties, which indicates selective W2SG can help superalignment.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

CARE: Decoding-Time Safety Alignment via Rollback and Introspection Intervention

Xiaomeng Hu
Fei Huang
Chenhan Yuan
Junyang Lin
Tsung-Yi Ho

As large language models (LLMs) are increasingly deployed in real-world applications, ensuring the safety of their outputs during decoding has become a critical challenge. However, existing decoding-time interventions, such as Contrastive Decoding, often force a severe trade-off between safety and response quality. In this work, we propose CARE, a novel framework for decoding-time safety alignment that integrates three key components: (1) a guard model for real-time safety monitoring, enabling detection of potentially unsafe content; (2) a rollback mechanism with a token buffer to correct unsafe outputs efficiently at an earlier stage without disrupting the user experience; and (3) a novel introspection-based intervention strategy, where the model generates self-reflective critiques of its previous outputs and incorporates these reflections into the context to guide subsequent decoding steps. The framework achieves a superior safety-quality trade-off by using its guard model for precise interventions, its rollback mechanism for timely corrections, and our novel introspection method for effective self-correction. Experimental results demonstrate that our framework achieves a superior balance of safety, quality, and efficiency, attaining a low harmful response rate and minimal disruption to the user experience while maintaining high response quality.