Author name cluster

Yufei Shi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

JBHI Journal 2026 Journal Article

RIHA: Report-Image Hierarchical Alignment for Radiology Report Generation

Yucheng Chen
Yang Yu
Yufei Shi
Conghao Xiong
Xulei Yang
Si Yong Yeo

Radiology report generation (RRG) has emerged as a promising approach to alleviate radiologists' workload and reduce human errors by automatically generating diagnostic reports from medical images. A key challenge in RRG is achieving fine-grained alignment between complex visual features and the hierarchical structure of long-form radiology reports. Although recent methods have improved image-text representation learning, they often treat reports as flat sequences, overlooking their structured sections and semantic hierarchies. This simplification hinders precise cross-modal alignment and weakens RRG accuracy. To address this challenge, we propose RIHA (Report-Image Hierarchical Alignment Transformer), a novel end-to-end framework that performs multi-level alignment between radiological images and their corresponding reports across paragraph, sentence, and word levels. This hierarchical alignment enables more precise cross-modal mapping, essential for capturing the nuanced semantics embedded in clinical narratives. Specifically, RIHA introduces a Visual Feature Pyramid (VFP) to extract multi-scale visual features and a Text Feature Pyramid (TFP) to represent multi-granularity textual structures. These components are integrated through a Cross-modal Hierarchical Alignment (CHA) module, leveraging optimal transport to effectively align visual and textual features across various levels. Furthermore, we incorporate Relative Positional Encoding (RPE) into the decoder to model spatial and semantic relationships among tokens, enhancing the token-level alignment between visual features and generated text. Extensive experiments on two benchmark chest X-ray datasets, IU-Xray and MIMIC-CXR, demonstrate that RIHA outperforms existing state-of-the-art models in both natural language generation and clinical efficacy metrics.

Details DOI

AAAI Conference 2026 Conference Paper

Think Then Rewrite: Reasoning Enhanced Query Rewriting for Domain Specific Retrieval

Ang Li
Yufei Shi
Yuxuan Si
Yiquan Wu
Ming Cai
Xu Tan
Yi Wang
Changlong Sun

Query rewriting is a crucial task for improving retrieval, especially in professional domains such as law and medicine, where user queries are often underspecified and ambiguous. While large language models (LLMs) offer strong understanding and generation capabilities, existing LLM-based approaches reduce the task to text transformation or expansion, neglecting reasoning to disambiguate queries, which fails to bridge the cognitive gap between user queries and specialized documents. In this paper, we propose Think-Then-Rewrite (TTR), a reinforcement learning based framework that unleashes LLMs' reasoning ability for domain-specific query rewriting. TTR introduces a contrastive mutual information reward to encourage the LLM to generate reasoning processes that effectively distinguish confusing distractors. To boost early-stage training, TTR also constructs golden query rewrites as off‑policy data, providing strong guidance for RL learning. A mixed-policy optimization then combines on-policy and off-policy signals, ensuring both effectiveness and stability. Extensive experiments on legal and medical retrieval benchmarks demonstrate that TTR achieves state-of-the-art performance.

PDF Details DOI

JBHI Journal 2023 Journal Article

A Spatiotemporal Graph Attention Network Based on Synchronization for Epileptic Seizure Prediction

Yao Wang
Yufei Shi
Yinlin Cheng
Zhipeng He
Xiaoyan Wei
Ziyi Chen
Yi Zhou

Accurate early prediction of epileptic seizures can provide timely treatment for patients. Previous studies have mainly focused on a single temporal or spatial dimension, making it difficult to take both relationships into account. Therefore, the effective properties of electroencephalograms (EEGs) may not be fully evaluated. To solve this problem, we propose a spatiotemporal graph attention network (STGAT) based on synchronization. The spatial and functional connectivity information between EEG channels was extracted by using the phase locking values (PLVs) first, which allowed multichannel EEG signals to be modeled as graph signals. Afterward, the STGAT model was used to dynamically learn the temporal correlation properties of EEG sequences and explore the spatial topological structure information of multiple channels. Experimental results demonstrated that the STGAT model was able to obtain spatiotemporal correlations and achieve good results on two benchmark datasets. The accuracy, specificity and sensitivity were 98. 74%, 99. 21% and 98. 87%, respectively, on the CHB-MIT dataset. Moreover, all evaluation indices of the private dataset had reached more than 98. 8%, with the area under the curve (AUC) reaching 99. 96%. The proposed method is superior or comparable to the state-of-the-art models. Extensive experiments demonstrate that our end-to-end automatic seizure prediction model can be extended to design clinical assistant decision systems.

Details DOI