Arrow Research search

Author name cluster

Wenting Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
1 author row

Possible papers

9

JBHI Journal 2026 Journal Article

ACGM: Attribute-Centric Graph Modeling Network for Concurrent Missing Tabular Data Imputation and COVID-19 Prognosis

  • Zhuoru Wu
  • Wenting Chen
  • Xuechen Li
  • Filippo Ruffini
  • Shaonan Liu
  • Lorenzo Tronchin
  • Domenico Albano
  • Eliodoro Faiella

COVID-19 prognosis using clinical tabular data faces significant challenges due to missing values and class imbalance issues. Existing methods often overlook the complex high-order interrelationship among clinicalattributes and struggle with training stability on imbalanced datasets. We propose ACGM, an attribute-centric graph modeling network that simultaneously addresses missing data imputation and COVID-19 prognosis. ACGM consists of three key modules: an attributes preprocessing module (APM) for coarse-grained imputation initialization, a graph-enhanced attributes imputation module (GEAIM) that models high-order inter-attribute relationships through graph structures, and a graph-enhanced disease prognosis module (GEDPM) that leverages these complex attribute interactions for final prediction. GEAIM and GEDPM employ a mean-teacher strategy with attributes graph matching to preserve high-order relationships, enhance training stability, and maintain structural integrity of attribute interactions. Extensive experiments are conducted on four public COVID-19 tabular datasets, demonstrating the superiority of our ACGM over existing methods. Through comprehensive interpretability analysis, we identify that attributes such as LDH, Difficulty In Breathing, and SaO 2 significantly impact COVID-19 prognosis, aligning well with clinical insights and radiologist assessments.

AAAI Conference 2026 Conference Paper

FaNe: Towards Fine-Grained Cross-Modal Contrast with False-Negative Reduction and Text-Conditioned Sparse Attention

  • Peng Zhang
  • Zhihui Lai
  • Wenting Chen
  • Xu Wu
  • Heng Kong

Medical vision-language pre-training (VLP) offers significant potential for advancing medical image understanding by leveraging paired image-report data. However, existing methods are limited by False Negatives (FaNe) induced by semantically similar texts and insufficient fine-grained cross-modal alignment. To address these limitations, we propose FaNe, a semantic-enhanced VLP framework. To mitigate false negatives, we introduce a semantic-aware positive pair mining strategy based on text-text similarity with adaptive normalization. Furthermore, we design a text-conditioned sparse attention pooling module to enable fine-grained image-text alignment through localized visual representations guided by textual cues. To strengthen intra-modal discrimination, we develop a hard-negative aware contrastive loss that adaptively reweights semantically similar negatives. Extensive experiments on five downstream medical imaging benchmarks demonstrate that FaNe achieves state-of-the-art performance across image classification, object detection, and semantic segmentation, validating the effectiveness of our framework.

AAAI Conference 2025 Conference Paper

DAMPER: A Dual-Stage Medical Report Generation Framework with Coarse-Grained MeSH Alignment and Fine-Grained Hypergraph Matching

  • Xiaofei Huang
  • Wenting Chen
  • Jie Liu
  • Qisheng Lu
  • Xiaoling Luo
  • Linlin Shen

Medical report generation is crucial for clinical diagnosis and patient management, summarizing diagnoses and recommendations based on medical imaging. However, existing work often overlook the clinical pipeline involved in report writing, where physicians typically conduct an initial quick review followed by a detailed examination. Moreover, current alignment methods may lead to misaligned relationships. To address these issues, we propose DAMPER, a dual-stage framework for medical report generation that mimics the clinical pipeline of report writing in two stages. In the first stage, a MeSH-Guided Coarse-Grained Alignment (MCG) stage that aligns chest X-ray (CXR) image features with medical subject headings (MeSH) features to generate a rough keyphrase representation of the overall impression. In the second stage, a Hypergraph-Enhanced Fine-Grained Alignment (HFG) stage that constructs hypergraphs for image patches and report annotations, modeling high-order relationships within each modality and performing hypergraph matching to capture semantic correlations between image regions and textual phrases. Finally,the coarse-grained visual features, generated MeSH representations, and visual hypergraph features are fed into a report decoder to produce the final medical report. Extensive experiments on public datasets demonstrate the effectiveness of DAMPER in generating comprehensive and accurate medical reports, outperforming state-of-the-art methods across various evaluation metrics.

NeurIPS Conference 2025 Conference Paper

EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis

  • Shengyuan Liu
  • Boyun Zheng
  • Wenting Chen
  • Zhihao Peng
  • Zhenfei Yin
  • Jing Shao
  • Jiancong Hu
  • Yixuan Yuan

Endoscopic procedures are essential for diagnosing and treating internal diseases, and multi-modal large language models (MLLMs) are increasingly applied to assist in endoscopy analysis. However, current benchmarks are limited, as they typically cover specific endoscopic scenarios and a small set of clinical tasks, failing to capture the real-world diversity of endoscopic scenarios and the full range of skills needed in clinical workflows. To address these issues, we introduce EndoBench, the first comprehensive benchmark specifically designed to assess MLLMs across the full spectrum of endoscopic practice with multi-dimensional capacities. EndoBench encompasses 4 distinct endoscopic scenarios, 12 specialized clinical tasks with 12 secondary subtasks, and 5 levels of visual prompting granularities, resulting in 6, 832 rigorously validated VQA pairs from 21 diverse datasets. Our multi-dimensional evaluation framework mirrors the clinical workflow—spanning anatomical recognition, lesion analysis, spatial localization, and surgical operations—to holistically gauge the perceptual and diagnostic abilities of MLLMs in realistic scenarios. We benchmark 23 state-of-the-art models, including general-purpose, medical-specialized, and proprietary MLLMs, and establish human clinician performance as a reference standard. Our extensive experiments reveal: (1) proprietary MLLMs outperform open-source and medical-specialized models overall, but still trail human experts; (2) medical-domain supervised fine-tuning substantially boosts task-specific accuracy; and (3) model performance remains sensitive to prompt format and clinical task complexity. EndoBench establishes a new standard for evaluating and advancing MLLMs in endoscopy, highlighting both progress and persistent gaps between current models and expert clinical reasoning. We publicly release our benchmark and code.

NeurIPS Conference 2025 Conference Paper

MedChain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence

  • Jie Liu
  • Wenxuan Wang
  • Zizhan Ma
  • Guolin Huang
  • Yihang SU
  • Kao-Jung Chang
  • Haoliang Li
  • Linlin Shen

Clinical decision making (CDM) is a complex, dynamic process crucial to healthcare delivery, yet it remains a significant challenge for artificial intelligence systems. While Large Language Model (LLM)-based agents have been tested on general medical knowledge using licensing exams and knowledge question-answering tasks, their performance in the CDM in real-world scenarios is limited due to the lack of comprehensive benchmark that mirror actual medical practice. To address this gap, we present MedChain, a dataset of 12, 163 clinical cases that covers five key stages of clinical workflow. MedChain distinguishes itself from existing benchmarks with three key features of real-world clinical practice: personalization, interactivity, and sequentiality. Further, to tackle real-world CDM challenges, we also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MedCase-RAG module to learn from previous cases and adapt its responses. MedChain-Agent demonstrates remarkable adaptability in gathering information dynamically and handling sequential clinical tasks, significantly outperforming existing approaches. The relevant dataset and code will be released upon acceptance of this paper.

AAAI Conference 2025 Conference Paper

S³-Mamba: Small-Size-Sensitive Mamba for Lesion Segmentation

  • Gui Wang
  • Yuexiang Li
  • Wenting Chen
  • Meidan Ding
  • Wooi Ping Cheah
  • Rong Qu
  • Jianfeng Ren
  • Linlin Shen

Small lesions play a critical role in early disease diagnosis and intervention of severe infections. Popular models often face challenges in segmenting small lesions, as it occupies only a minor portion of an image, while down-sampling operations may inevitably lose focus on local features of small lesions. To tackle the challenges, we propose a Small-Size-Sensitive Mamba (S³-Mamba), which promotes the sensitivity to small lesions across three dimensions: channel, spatial, and training strategy. Specifically, an Enhanced Visual State Space block is designed to focus on small lesions through multiple residual connections to preserve local features, and selectively amplify important details while suppressing irrelevant ones through channel-wise attention. A Tensor-based Cross-feature Multi-scale Attention is designed to integrate input image features and intermediate-layer features with edge features and exploit the attentive support of features across multiple scales, thereby retaining spatial details of small lesions at various granularities. Finally, we introduce a novel regularized curriculum learning to automatically assess lesion size and sample difficulty, and gradually focus from easy samples to hard ones like small lesions. Extensive experiments on three medical image segmentation datasets show the superiority of our S³-Mamba, especially in segmenting small lesions.

NeurIPS Conference 2024 Conference Paper

Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

  • Chong Ma
  • Hanqi Jiang
  • Wenting Chen
  • Yiwei Li
  • Zihao Wu
  • Xiaowei Yu
  • Zhengliang Liu
  • Lei Guo

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.

JBHI Journal 2022 Journal Article

Dynamic Depth-Aware Network for Endoscopy Super-Resolution

  • Wenting Chen
  • Yifan Liu
  • Jiancong Hu
  • Yixuan Yuan

Endoscopy super-resolution (SR) plays an important role in improving diagnostic results and reducing the misdiagnosis rate. Even though recent studies have investigated the SR for endoscopy, these methods apply equal importance to the whole image and do not consider the relationship among pixels, especially the depth information, which can provide diagnosis-related information for clinicians. To address this problem, we propose a dynamic depth-aware network for endoscopy super-resolution, which represents the first effort to comprehensively integrate the depth information to the SR task for endoscopic images. It includes a depth-wise feature extracting branch (DW-B) and a depth-guided SR branch (DGSR-B). The DW-B aims to extract the representative feature for each depth level (i. e. depth matrix) further to provide auxiliary information and guide the super-resolution of texture under different depth levels. In DGSR-B, a depth-guided block (DGB) consisting of depth-focus normalization (DFN) is introduced to inject both the depth matrix and depth map into the LR image feature, so as to guide the image generation for each depth region. To adaptively super-resolve the regions under different depth levels, we devise a dynamic depth-aware loss to assign different trainable weights to each region for SR optimization. Extensive experiments have been conducted on two main publicly available datasets, i. e. , the Kvasir dataset and the EndoScene dataset, and the superior performance verifies the effectiveness of our method for SR task and polyp segmentation. Source code is to be released.

AAAI Conference 2021 Conference Paper

Translate the Facial Regions You Like Using Self-Adaptive Region Translation

  • Wenshuang Liu
  • Wenting Chen
  • Zhanjia Yang
  • Linlin Shen

With the progression of Generative Adversarial Networks (GANs), image translation methods has achieved increasingly remarkable performance. However, most available methods can only achieve image level translation, which is unable to precisely control the regions to be translated. In this paper, we propose a novel self-adaptive region translation network (SART) for region-level translation, which uses regionadaptive instance normalization (RIN) and a region matching loss (RML) for this task. We first encode the style and content image for each region with style and content encoder. To translate both shape and texture of the target region, we inject region-adaptive style features into the decoder by RIN. To ensure independent translation among different regions, RML is proposed to measure the similarity between the nontranslated/translated regions of content and translated images. Extensive experiments on three publicly available datasets, i. e. Morph, RaFD and CelebAMask-HQ, suggest that our approach demonstrate obvious improvement over state-of-theart methods like StarGAN, SEAN and FUNIT. Our approach has further advantages in precise control of the regions to be translated. As a result, region level expression changes and step-by-step make-up can be achieved. The video demo is available at (https: //youtu. be/DvIdmcR2LEc).