Arrow Research search

Author name cluster

Yi Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

AAAI Conference 2026 Conference Paper

A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation

  • Puzhen Wu
  • Hexin Dong
  • Yi Lin
  • Yihao Ding
  • Yifan Peng

Radiology report generation from chest X-rays is an important task in artificial intelligence with the potential to greatly reduce radiologists' workload and shorten patient wait times. Despite recent advances, existing approaches often lack sufficient disease-awareness in visual representations and adequate vision-language alignment to meet the specialized requirements of medical image analysis. As a result, these models usually overlook critical pathological features on chest X-rays and struggle to generate clinically accurate reports. To address these limitations, we propose a novel dual-stage disease-aware framework for chest X-ray report generation. In Stage~1, our model learns Disease-Aware Semantic Tokens (DASTs) corresponding to specific pathology categories through cross-attention mechanisms and multi-label classification, while simultaneously aligning vision and language representations via contrastive learning. In Stage~2, we introduce a Disease-Visual Attention Fusion (DVAF) module to integrate disease-aware representations with visual features, along with a Dual-Modal Similarity Retrieval (DMSR) mechanism that combines visual and disease-specific similarities to retrieve relevant exemplars, providing contextual guidance during report generation. Extensive experiments on benchmark datasets (i.e., CheXpert Plus, IU X-ray, and MIMIC-CXR) demonstrate that our disease-aware framework achieves state-of-the-art performance in chest X-ray report generation, with significant improvements in clinical accuracy and linguistic quality.

AAAI Conference 2026 Conference Paper

GeWu: A Culturally-Grounded Chinese Benchmark for Multi-Stage Social Bias Evaluation in Large Language Models

  • Yi Lin
  • Ziyi Zhou
  • Jiashi Gao
  • Xinwei Guo
  • Jiaxin Zhang
  • Haiyan Wu
  • Xin Yao
  • Xuetao Wei

With the rapid deployment of Chinese large language models (LLMs), culturally-grounded bias evaluation remains understudied due to the dominance of English benchmarks and simplistic Chinese scenarios. To address this, we propose GeWu, a comprehensive benchmark featuring a culturally-aware dataset of 60,192 questions spanning 14 social groups with fine-grained Chinese contexts, significantly exceeding existing resources in breadth and depth. Our two-stage evaluation first quantifies bias via multiple-choice questions using a novel probability-based scoring mechanism to sensitively capture bias tendencies, distilling high-bias scenarios into GeWu-1K. This refined subset then enables multi-turn dialogue evaluations for in-depth analysis under realistic conditions. Experiments reveal that GeWu effectively exposes social biases in state-of-the-art Chinese LLMs, with 13.93% of scenarios eliciting universal bias across all models. This highlights persistent challenges and provides actionable insights for bias mitigation in Chinese contexts.

NeurIPS Conference 2025 Conference Paper

Brain Harmony: A Multimodal Foundation Model Unifying Morphology and Function into 1D Tokens

  • Zijian Dong
  • Ruilin Li
  • Joanna Chong
  • Niousha Dehestani
  • Yinghui Teng
  • Yi Lin
  • Zhizhou Li
  • Yichi Zhang

We present Brain Harmony (BrainHarmonix), the first multimodal brain foundation model that unifies structural morphology and functional dynamics into compact 1D token representations. The model was pretrained on two of the largest neuroimaging datasets to date, encompassing 64, 594 T1-weighted structural MRI 3D volumes (~ 14 million images) and 70, 933 functional MRI (fMRI) time series. BrainHarmonix is grounded in two foundational neuroscience principles: structure complements function - structural and functional modalities offer distinct yet synergistic insights into brain organization; function follows structure - brain functional dynamics are shaped by cortical morphology. The modular pretraining process involves single-modality training with geometric pre-alignment followed by modality fusion through shared brain hub tokens. Notably, our dynamics encoder uniquely handles fMRI time series with heterogeneous repetition times (TRs), addressing a major limitation in existing models. BrainHarmonix is also the first to deeply compress high-dimensional neuroimaging signals into unified, continuous 1D tokens, forming a compact latent space of the human brain. BrainHarmonix achieves strong generalization across diverse downstream tasks, including neurodevelopmental and neurodegenerative disorder classification and cognition prediction - consistently outperforming previous approaches. Our models - pretrained on 8 H100 GPUs - aim to catalyze a new era of AI-driven neuroscience powered by large-scale multimodal neuroimaging.

JBHI Journal 2025 Journal Article

RPD: Regional Prior Distillation for Breast Cancer Diagnosis in Ultrasound Images

  • Yi Lin
  • Haosen Wang
  • Yingnan Zhao
  • Dan Lu
  • Yanchen Xu
  • Jiexiao Xue
  • Xi Chen
  • Jingchi Jiang

Breast cancer is the leading cause of death among women worldwide. Ultrasound imaging is an important means for the early detection of breast cancer, improving the survival rate. Due to the shortage of experienced sonographers, computer-aided systems for breast cancer recognition become particularly important. Some recent studies analyze tumor types in lesion regions but rely on predefined ROIs. Some other studies recognize cancer in the whole ultrasound image, but always suffer from the extremely variable proportion, location and quantity of the tumor lesions. In this paper, we propose a regional prior distillation (RPD) framework for breast cancer diagnosis in ultrasound images. To enhance the analysis of the tumor region, we propose an Image-Cross Attention (ICA) to fuse the predefined ROI prior information with ultrasound images and train a prior-fused model. To remove the constraint of predefined ROIs, we propose a Distribution Distillation Learning (DDL) to distill the prior-fused sample distribution from the prior-fused model into a diagnostic model, which analyzes the disease from only ultrasound images, based on the knowledge distillation paradigm of the teacher-student framework. Comprehensive experiments are conducted on multi-institutional datasets to validate the proposed RPD framework. The results demonstrate the following points. The ICA fuses regional prior information adequately, leading to a high-performance prior-fused model. The DDL distills the prior information effectively, enhancing the diagnostic model to focus on the tumor lesions. The performance of the diagnostic model surpasses that of current SOTA methods by 1. 66% in accuracy and 0. 64% in AUC. In addition, the diagnostic model is robust to slight perturbations and achieves good generalization performance.

JBHI Journal 2024 Journal Article

Adaptive Fusion of Deep Learning With Statistical Anatomical Knowledge for Robust Patella Segmentation From CT Images

  • Jiachen Zhao
  • Tianshu Jiang
  • Yi Lin
  • Lok-Chun Chan
  • Ping-Keung Chan
  • Chunyi Wen
  • Hao Chen

Kneeosteoarthritis (KOA), as a leading joint disease, can be decided by examining the shapes of patella to spot potential abnormal variations. To assist doctors in the diagnosis of KOA, a robust automatic patella segmentation method is highly demanded in clinical practice. Deep learning methods, especially convolutional neural networks (CNNs) have been widely applied to medical image segmentation in recent years. Nevertheless, poor image quality and limited data still impose challenges to segmentation via CNNs. On the other hand, statistical shape models (SSMs) can generate shape priors which give anatomically reliable segmentation to varying instances. Thus, in this work, we propose an adaptive fusion framework, explicitly combining deep neural networks and anatomical knowledge from SSM for robust patella segmentation. Our adaptive fusion framework will accordingly adjust the weight of segmentation candidates in fusion based on their segmentation performance. We also propose a voxel-wise refinement strategy to make the segmentation of CNNs more anatomically correct. Extensive experiments and thorough assessment have been conducted on various mainstream CNN backbones for patella segmentation in low-data regimes, which demonstrate that our framework can be flexibly attached to a CNN model, significantly improving its performance when labeled training data are limited and input image data are of poor quality.

JBHI Journal 2024 Journal Article

EIRAD: An Evidence-Based Dialogue System With Highly Interpretable Reasoning Path for Automatic Diagnosis

  • Lian Yan
  • Yi Guan
  • Haotian Wang
  • Yi Lin
  • Yang Yang
  • Boran Wang
  • Jingchi Jiang

Dialogue System for Medical Diagnosis (DSMD) based on reinforcement learning (RL) can simulate patient-doctor interactions, playing a crucial role in clinical diagnosis. However, due to the complexity of disease etiology, DSMD faces the challenges of low efficiency in diagnostic evidence search. Moreover, solely RL-based DSMS, without the constraints of professional medical knowledge, often generates irrational, meaningless, or even erroneous symptom inquiries, leading to poor interpretability of diagnostic path and high misdiagnosis rates. To address these issues, we propose an E vidence-based dialogue system with highly I nterpretable R easoning path for A utomatic D iagnosis (EIRAD) grounded in medical knowledge graph (MKG). Specifically, our automated diagnostic model captures key symptoms for suspected diseases by explicitly leveraging the topology of MKG, enhancing the interpretability and accuracy of diagnosis. To expedite the retrieval of factual evidence, we develop two mechanisms: 1) Mapping mechanism between the entity set of MKG and DSMD's diagnostic evidence and diseases. According to the patient's symptoms, EIRAD prunes irrelevant disease and symptom nodes from the MKG, which can truncate the invalid action of RL-based DSMD. 2) Reward Mechanism of integrating the effectiveness of symptom inquiry and the accuracy of disease diagnosis. The comprehensive reward system is suitable for intelligent consultation, which can effectively drive DSMD to accelerate evidence collection. Experimental results demonstrate that our model significantly outperforms competitive benchmark methods in symptom inquiry efficiency and diagnostic accuracy.

AAAI Conference 2024 Conference Paper

FlightBERT++: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework

  • Dongyue Guo
  • Zheng Zhang
  • Zhen Yan
  • Jianwei Zhang
  • Yi Lin

Flight Trajectory Prediction (FTP) is an essential task in Air Traffic Control (ATC), which can assist air traffic controllers in managing airspace more safely and efficiently. Existing approaches generally perform multi-horizon FTP tasks in an autoregressive manner, thereby suffering from error accumulation and low-efficiency problems. In this paper, a novel framework, called FlightBERT++, is proposed to i) forecast multi-horizon flight trajectories directly in a non-autoregressive way, and ii) improve the limitation of the binary encoding (BE) representation in the FlightBERT. Specifically, the FlightBERT++ is implemented by a generalized encoder-decoder architecture, in which the encoder learns the temporal-spatial patterns from historical observations and the decoder predicts the flight status for the future horizons. Compared with conventional architecture, an innovative horizon-aware contexts generator is dedicatedly designed to consider the prior horizon information, which further enables non-autoregressive multi-horizon prediction. Moreover, a differential prompted decoder is proposed to enhance the capability of the differential predictions by leveraging the stationarity of the differential sequence. The experimental results on a real-world dataset demonstrated that the FlightBERT++ outperformed the competitive baselines in both FTP performance and computational efficiency.

IJCAI Conference 2024 Conference Paper

LocMoE: A Low-overhead MoE for Large Language Model Training

  • Jing Li
  • Zhijie Sun
  • Xuan He
  • Li Zeng
  • Yi Lin
  • Entong Li
  • Binfan Zheng
  • Rongqian Zhao

The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-to-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGu-Σ model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12. 68% to 22. 24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.

AAAI Conference 2024 Conference Paper

PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation

  • Haibo Jin
  • Haoxuan Che
  • Yi Lin
  • Hao Chen

Automatic medical report generation (MRG) is of great research value as it has the potential to relieve radiologists from the heavy burden of report writing. Despite recent advancements, accurate MRG remains challenging due to the need for precise clinical understanding and disease identification. Moreover, the imbalanced distribution of diseases makes the challenge even more pronounced, as rare diseases are underrepresented in training data, making their diagnosis unreliable. To address these challenges, we propose diagnosis-driven prompts for medical report generation (PromptMRG), a novel framework that aims to improve the diagnostic accuracy of MRG with the guidance of diagnosis-aware prompts. Specifically, PromptMRG is based on encoder-decoder architecture with an extra disease classification branch. When generating reports, the diagnostic results from the classification branch are converted into token prompts to explicitly guide the generation process. To further improve the diagnostic accuracy, we design cross-modal feature enhancement, which retrieves similar reports from the database to assist the diagnosis of a query image by leveraging the knowledge from a pre-trained CLIP. Moreover, the disease imbalanced issue is addressed by applying an adaptive logit-adjusted loss to the classification branch based on the individual learning status of each disease, which overcomes the barrier of text decoder's inability to manipulate disease distributions. Experiments on two MRG benchmarks show the effectiveness of the proposed method, where it obtains state-of-the-art clinical efficacy performance on both datasets.

ICRA Conference 2023 Conference Paper

Efficient Implicit Neural Reconstruction Using LiDAR

  • Dongyu Yan
  • Xiaoyang Lyu
  • Jieqi Shi
  • Yi Lin

Modeling scene geometry using implicit neural representation has revealed its advantages in accuracy, flexibility, and low memory usage. Previous approaches have demonstrated impressive results using color or depth images but still have difficulty handling poor light conditions and large-scale scenes. Methods taking global point cloud as input require accurate registration and ground truth coordinate labels, which limits their application scenarios. In this paper, we propose a new method that uses sparse LiDAR point clouds and rough odometry to reconstruct fine-grained implicit occupancy field efficiently within a few minutes. We introduce a new loss function that supervises directly in 3D space without 2D rendering, avoiding information loss. We also manage to refine poses of input frames in an end-to-end manner, creating consistent geometry without global point cloud registration. As far as we know, our method is the first to reconstruct implicit scene representation from LiDAR-only input. Experiments on synthetic and real-world datasets, including indoor and outdoor scenes, prove that our method is effective, efficient, and accurate, obtaining comparable results with existing methods using dense input.

YNIMG Journal 2022 Journal Article

Neural correlates of individual differences in predicting ambiguous sounds comprehension level

  • Yi Lin
  • Yu Tsao
  • Po-Jang Hsieh

This study investigated brain activation during auditory processing as a biomarker for the prediction of future perceptual learning performance. Cochlear implant simulated sounds (vocoded sounds) are degraded signals. Participants with normal hearing who were trained with these ambiguous sounds showed varied speech comprehension levels. We discovered that the neuronal signatures from untrained participants forecasted their future ambiguous speech comprehension levels. Participants' brain activations for auditory information processing were measured before (t1) they underwent a five-day vocoded sounds training session. We showed that the pre-training (t1) activities in the inferior frontal gyrus (IFG) correlate with the fifth-day (t2) vocoded sound comprehension performance. To further predict participants' future (t2) performances, we split the participants into two groups (i.e., good and bad learners) based on their fifth-day performance; a linear support vector machine (SVM) was trained to classify (predict) the remaining participants' groups. We found that pre-training (t1) fMRI activities in the bilateral IFG, angular gyrus (AG), and supramarginal gyrus (SMG) showed discriminability between future (t2) good and bad learners. These findings suggest that neural correlates of individual differences in auditory processing can potentially be used to predict participants' future cognition and behaviors.

IROS Conference 2021 Conference Paper

Design of an SSVEP-based BCI Stimuli System for Attention-based Robot Navigation in Robotic Telepresence

  • Xingchao Wang
  • Xiaopeng Huang
  • Yi Lin
  • Liguang Zhou
  • Zhenglong Sun 0001
  • Yangsheng Xu

Brain-computer interface (BCI)-based robotic telepresence provides an opportunity for people with disabilities to control robots remotely without any actual physical movement. However, traditional BCI systems usually require the user to select the navigation direction from visual stimuli in a fixed background, which makes it difficult to control the robot in a dynamic environment during the locomotion. In this paper, a novel SSVEP-based BCI stimuli system is proposed for robotic telepresence. The novel system utilized the live video streamed from the robot onboard camera as the input. By altering and flickering the detected objects in the scene with different frequencies predefined based on their relative positions on the screen, the robot can be navigated based on the user’s attention in a dynamic manner. In order to better differentiate multiple objects (more than the number of frequencies predefined), the task-related component analysis (TRCA) model was trained with a priori offline experimental data to select the front objects with priority. Experiments were conducted to validate the proposed system. Using the system, four human subjects are able to control a humanoid robot to navigate through multiple objects to reach the desired goal. The success rate reaches 87. 5% in average.

YNIMG Journal 2021 Journal Article

Learning dynamic graph embeddings for accurate detection of cognitive state changes in functional brain networks

  • Yi Lin
  • Defu Yang
  • Jia Hou
  • Chenggang Yan
  • Minjeong Kim
  • Paul J Laurienti
  • Guorong Wu

Mounting evidence shows that brain functions and cognitive states are dynamically changing even in the resting state rather than remaining at a single constant state. Due to the relatively small changes in BOLD (blood-oxygen-level-dependent) signals across tasks, it is difficult to detect the change of cognitive status without requiring prior knowledge of the experimental design. To address this challenge, we present a dynamic graph learning approach to generate an ensemble of subject-specific dynamic graph embeddings, which allows us to use brain networks to disentangle cognitive events more accurately than using raw BOLD signals. The backbone of our method is essentially a representation learning process for projecting BOLD signals into a latent vertex-temporal domain with the greater biological underpinning of brain activities. Specifically, the learned representation domain is jointly formed by (1) a set of harmonic waves that govern the topology of whole-brain functional connectivities and (2) a set of Fourier bases that characterize the temporal dynamics of functional changes. In this regard, our dynamic graph embeddings provide a new methodology to investigate how these self-organized functional fluctuation patterns oscillate along with the evolving cognitive status. We have evaluated our proposed method on both simulated data and working memory task-based fMRI datasets, where our dynamic graph embeddings achieve higher accuracy in detecting multiple cognitive states than other state-of-the-art methods.

JBHI Journal 2020 Journal Article

Bi-Modality Medical Image Synthesis Using Semi-Supervised Sequential Generative Adversarial Networks

  • Xin Yang
  • Yi Lin
  • Zhiwei Wang
  • Xin Li
  • Kwang-Ting Cheng

In this paper, we propose a bi-modality medical image synthesis approach based on sequential generative adversarial network (GAN) and semi-supervised learning. Our approach consists of two generative modules that synthesize images of the two modalities in a sequential order. A method for measuring the synthesis complexity is proposed to automatically determine the synthesis order in our sequential GAN. Images of the modality with a lower complexity are synthesized first, and the counterparts with a higher complexity are generated later. Our sequential GAN is trained end-to-end in a semi-supervised manner. In supervised training, the joint distribution of bi-modality images are learned from real paired images of the two modalities by explicitly minimizing the reconstruction losses between the real and synthetic images. To avoid overfitting limited training images, in unsupervised training, the marginal distribution of each modality is learned based on unpaired images by minimizing the Wasserstein distance between the distributions of real and fake images. We comprehensively evaluate the proposed model using two synthesis tasks based on three types of evaluate metrics and user studies. Visual and quantitative results demonstrate the superiority of our method to the state-of-the-art methods, and reasonable visual quality and clinical significance. Code is made publicly available at https://github.com/hust-linyi/Multimodal-Medical-Image-Synthesis.