Arrow Research search

Author name cluster

Yanhui Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
1 author row

Possible papers

2

AAAI Conference 2026 Conference Paper

From Attribution to Action: Jointly ALIGNing Predictions and Explanations

  • Dongsheng Hong
  • Chao Chen
  • Yanhui Chen
  • Shanshan Lin
  • Zhihao Chen
  • Xiangwen Liao

Explanation-guided learning (EGL) has shown promise in aligning model predictions with interpretable reasoning, particularly in computer vision tasks. However, most approaches rely on external annotations or heuristic-based segmentation to supervise model explanations, which can be noisy, imprecise and difficult to scale. In this work, we provide both empirical and theoretical evidence that low-quality supervision signals can degrade model performance rather than improve it. In response, we propose ALIGN, a novel framework that jointly trains a classifier and a masker in an iterative manner. The masker learns to produce soft, task-relevant masks that highlight informative regions, while the classifier is optimized for both prediction accuracy and alignment between its saliency maps and the learned masks. By leveraging high-quality masks as guidance, ALIGN improves both interpretability and generalizability, showing its superiority across various settings. Experiments on the two domain generalization benchmarks, VLCS and Terra Incognita, show that ALIGN consistently outperforms six strong baselines in both in-distribution and out-of-distribution settings. Besides, ALIGN also yields superior explanation quality concerning sufficiency and comprehensiveness, highlighting its effectiveness in producing accurate and interpretable models.

EAAI Journal 2025 Journal Article

Diffusion-based vision-language model for zero-shot anomaly detection in medical images

  • Yanhui Chen
  • Hongkang Tao
  • Zan Yang
  • Yunkang Cao
  • Chen Jiang
  • Longhua Hu
  • Pengwen Xiong
  • Haobo Qiu

With the rapid advancement of diagnostic technology, the ability to detect pathological areas such as tumors and polyps has significantly improved. This progress provides medical imaging specialists with more precise visual information to support anomaly identification, diagnosis, treatment planning, and patient monitoring. However, existing unsupervised and semi-supervised anomaly detection methods struggle with data privacy constraints, limited annotated medical datasets, and challenges in generalization. Zero-Shot Anomaly Detection (ZSAD), which enables the detection of unseen categories without requiring class-specific training, has emerged as a promising solution by leveraging the vision-language alignment capabilities of Vision-Language Models (VLMs), such as Contrastive Language-Image Pretraining (CLIP). Despite recent progress, ZSAD remains hindered by high noise levels, sparse targets, and poor adaptability in complex medical imaging scenarios. To address these issues, we propose a novel framework: DiffusionCLIP, a diffusion-based VLM for zero-shot anomaly detection in two-dimensional medical images. Specifically, DiffusionCLIP integrates diffusion models into the VLM to progressively denoise multi-level features extracted from the CLIP visual encoder, enhancing feature robustness and discriminability. A multi-level feature fusion strategy is designed to aggregate multi-scale representations from different depths of the visual encoder, ensuring complementary semantic alignment across layers. In addition, a dynamically modulated weight loss function is introduced to adaptively balance the learning of hard and easy samples, further improving model generalization. Extensive experiments on multiple benchmark medical imaging datasets, demonstrate that the proposed method significantly outperforms existing zero-shot anomaly detection approaches in terms of accuracy, robustness, and generalization.