Arrow Research search

Author name cluster

Yitian Zhao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
1 author row

Possible papers

17

JBHI Journal 2026 Journal Article

Heterophily-Aware Spectral GCN for Population-Level Brain Disorder Prediction

  • Hao Zhang
  • Liping Wang
  • Yitian Zhao
  • Jianyang Xie
  • Tiyu Fang
  • Ran Song
  • Wei Zhang

Integrating resting-state functional magnetic resonance imaging (rs-fMRI) and phenotypic data is a promising way to build a comprehensive population graph for the prediction of brain disorders using graph neural networks (GNNs). However, existing GNN-based methods face two limitations: the complexity of relationships between subjects poses challenges in constructing a well-defined population graph, and the inherent node heterophily within the population graph is often overlooked. To address them, we propose a population graph with a phenotypic encoder, which leverages rs-fMRI and phenotypic data to model complex relationships between subjects and enables GNN to learn population-level features. We also design a heterophily-aware spectral graph convolution network that incorporates local similarity-based learning to assess node homophily and addresses the heterophily issue. Experiments demonstrate that our method performs well in classifying both Alzheimer's Disease and Autism Spectrum Disorder. In addition, it can distinguish between progressive and stable mild cognitive impairment, facilitating timely interventions for the diseases.

JBHI Journal 2025 Journal Article

$\text{MR}^{2}$-Net: Retinal OCTA Image Stitching via Multi-Scale Representation Learning and Dynamic Location Guidance

  • Haiting Mao
  • Yuhui Ma
  • Dan Zhang
  • Yanda Meng
  • Shaodong Ma
  • Yuchuan Qiao
  • Huazhu Fu
  • Caifeng Shan

Optical coherence tomography angiography (OCTA) plays a crucial role in quantifying and analyzing retinal vascular diseases. However, the limited field of view (FOV) inherent in most commercial OCTA imaging systems poses a significant challenge for clinicians, restricting the possibility to analyze larger retinal regions of high resolution. Automatic stitching of OCTA scans in adjacent regions may provide a promising solution to extend the region of interest. However, commonly-used stitching algorithms face difficulties in achieving effective alignment due to noise, artifacts and dense vasculature present in OCTA images. To address these challenges, we propose a novel retinal OCTA image stitching network, named $\text{MR}^{2}$ -Net, which integrates multi-scale representation learning and dynamic location guidance. In the first stage, an image registration network with a progressive multi-resolution feature fusion is proposed to derive deep semantic information effectively. Additionally, we introduce a dynamic guidance strategy to locate the foveal avascular zone (FAZ) and constrain registration errors in overlapping vascular regions. In the second stage, an image fusion network based on multiple mask constraints and adjacent image aggregation (AIA) strategies is developed to further eliminate the artifacts in the overlapping areas of stitched images, thereby achieving precise vessel alignment. To validate the effectiveness of our method, we conduct a series of experiments on two delicately constructed datasets, i. e. , OPTOVUE-OCTA and SVision-OCTA. Experimental results demonstrate that our method outperforms other image stitching methods and effectively generates high-quality wide-field OCTA images, achieving a structural similarity index (SSIM) score of 0. 8264 and 0. 8014 on the two datasets, respectively.

AAAI Conference 2025 Conference Paper

DARR: A Dual-Branch Arithmetic Regression Reasoning Framework for Solving Machine Number Reasoning

  • Chengtai Li
  • Yee Yang Tan
  • Yuting He
  • Jianfeng Ren
  • Ruibin Bai
  • Yitian Zhao
  • Heng Yu
  • Xudong Jiang

Abstract visual reasoning (AVR) is a critical ability of humans, and it has been widely studied, but arithmetic visual reasoning, a unique task in AVR to reason over number sense, is less studied in the literature. To facilitate this research, we construct a Machine Number Reasoning (MNR) dataset to assess the model's ability in arithmetic visual reasoning over number sense and spatial layouts. To solve the MNR tasks, we propose a Dual-branch Arithmetic Regression Reasoning (DARR) framework, which includes an Intra-Image Arithmetic Regression Reasoning (IIARR) module and a Cross-Image Arithmetic Regression Reasoning (CIARR) module. The IIARR includes a set of Intra-Image Regression Blocks to identify the correct number orders and the underlying arithmetic rules within individual images, and an Order Gate to determine the correct number order. The CIARR establishes the arithmetic relations across different images through a `3-to-1' regressor and a set of `2-to-1' regressors, with a Selection Gate to select the most suitable `2-to-1' regressor and a gated fusion to combine the two kinds of regressors. Experiments on the MNR dataset show that the DARR outperforms state-of-the-art models for arithmetic visual reasoning.

NeurIPS Conference 2025 Conference Paper

DSRF: A Dynamic and Scalable Reasoning Framework for Solving RPMs

  • Chengtai Li
  • Yuting He
  • Jianfeng Ren
  • Ruibin Bai
  • Yitian Zhao
  • Xudong Jiang

Abstract Visual Reasoning (AVR) entails discerning latent patterns in visual data and inferring underlying rules. Existing solutions often lack scalability and adaptability, as deep architectures tend to overfit training data, and static neural networks fail to dynamically capture diverse rules. To tackle the challenges, we propose a Dynamic and Scalable Reasoning Framework (DSRF) that greatly enhances the reasoning ability by widening the network instead of deepening it, and dynamically adjusting the reasoning network to better fit novel samples instead of a static network. Specifically, we design a Multi-View Reasoning Pyramid (MVRP) to capture complex rules through layered reasoning to focus features at each view on distinct combinations of attributes, widening the reasoning network to cover more attribute combinations analogous to complex reasoning rules. Additionally, we propose a Dynamic Domain-Contrast Prediction (DDCP) block to handle varying task-specific relationships dynamically by introducing a Gram matrix to model feature distributions, and a gate matrix to capture subtle domain differences between context and target features. Extensive experiments on six AVR tasks demonstrate DSRF’s superior performance, achieving state-of-the-art results under various settings. Code is available here: https: //github. com/UNNCRoxLi/DSRF.

JBHI Journal 2025 Journal Article

Enhancing Trustworthiness of Semantic Segmentation in Cataract Surgery Videos via Intra-Phase Label Propagation

  • Mingen Zhang
  • Yuanyuan Gu
  • Xu Chen
  • Botian Zheng
  • Donghan Wu
  • Jinxian Zhang
  • Yufei Wu
  • Yonghuai Liu

Accurate segmentation of semantic features is a pivotal procedure for cataract surgery assistance, surgical skill assessment and related applications. However, previous studies have failed to consider the instance-level feature similarity of instruments across different surgical phases in cataract surgery videos, leading to unreliable decision-making regarding instrument categories. In this study, we propose a label propagation framework to effectively leverage the consistency of phase-specific instruments, which utilizes the initial frame labels from each surgical phase to predict masks for the remaining frames, achieving precise and trustworthy semantic segmentation of cataract surgery videos. Specifically, we design a pseudo-label generation and filtering strategy to automatically obtain highly reliable initial frame labels for each surgical phase. In addition, we establish a fixed-size memory bank with an adaptive update module to ensure long-term applicability in real surgical environments. To address the common problem of blurred edges in cataract surgery scenes, we develop a semantic edge perception module to allow the model to focus on and distinguish the edges of different objects. The proposed method achieved an mIoU of 80. 7% and 88. 8% on a publicly available dataset (14 categories) and a private dataset (12 categories) with a total of 9, 723 frames, respectively, significantly outperforming the state-of-the-art methods and other label propagation-based approaches. Furthermore, our method minimizes memory consumption and maintains about 30 FPS while processing long video sequences.

JBHI Journal 2025 Journal Article

Rethinking Data Augmentation for Single-Source Domain Generalization in OCT Image Segmentation

  • Jiayi Lu
  • Shaodong Ma
  • Yonghuai Liu
  • Yuhui Ma
  • Lei Mou
  • Yang Jiang
  • Yitian Zhao

Domain shifts between samples acquired with different instruments are one of the major challenges in accurate segmentation of Optical Coherence Tomography (OCT) images. Given that OCT images may be acquired with different devices in different clinical centers, this study presents astyle and structure data augmentation (SSDA) method to improve the adaptability of segmentation models. Inspired by our initial analysis of OCT domain differences, we propose an innovative hypothesis that domain shifts are primarily due to differences in image style and anatomical structure, which further guides the design of our method. By designing a modality-specific NURBS curve for style enhancement and implementing global and local elastic deformation fields, SSDA addresses both stylistic and structural variations in OCT data. Global deformations simulate changes in retinal curvature, while local deformations model layer-specific changes observed in OCT images. We validate our hypothesis through a comprehensive evaluation conducted on five OCT data domains, each differing in device type and imaging conditions. We train models on each of these domains for single-domain generalisation experiments and evaluate performance on the remaining unseen domains. The results show that SSDA outperforms existing methods when segmenting OCT images from different sources with different requirements for retinal layer segmentation. Specifically, across five different source domain generalisation experiments, SSDA achieves approximately 1. 6% higher Dice and 2. 6% improved MIOU, underscoring its superior segmentation accuracy and robust generalisation across all evaluated unseen domains.

JBHI Journal 2025 Journal Article

Super-Resolution Reconstruction of OCTA Via Multi-Field-of-View Representation Learning

  • Huaying Hao
  • Shaoyi Leng
  • Yanda Meng
  • Yonghuai Liu
  • Yalin Zheng
  • Huazhu Fu
  • Jiong Zhang
  • Quanyong Yi

High-resolution Optical Coherence Tomography Angiography (OCTA) images are essential for morphological analysis and biomarker measurement of the retinal vasculature. They can also provide underlying biomarkers for the accurate analysis of eye-related diseases. The trade-off between the high resolution (HR) and large scanning field-of-view (FOV) is a long-standing problem for OCTA image instrument. A large FOV image provides more retinal information with shorter acquisition time but often suffers from low resolution (LR), high scatter noise, and poor vascular contrast. In order to obtain HR OCTA images with larger FOV, we propose a novel self-similar dynamic domain adaptation network based on cross-field-of-view representation learning. The network enables LR images ( i. e. , $6\times \text{6}~\text{mm}^{2}$ ) to learn HR image ( i. e. , $3\times \text{3}~\text{mm}^{2}$ ) feature representations specialized for OCTA by constructing feature mapping relations for cross-field-of-view OCTA scans. To be specific, a multiple random degradation model is proposed on HR images to generate various synthetic LR images. Further, we propose a dynamic domain adaptation framework that prompts feature dynamic alignment of the LR image reconstruction results with those of synthetic LR images. Finally, a novel self-similar supervision loss is proposed to optimize the reconstruction results from LR to HR by exploiting the similarity between vessels in different regions. Experimental results on three OCTA datasets show that the proposed method surpasses existing state-of-the-art ones, significantly enhancing retinal structure segmentation and disease classification. Our OCTA dataset (the first dataset in this research area with paired $3\times 3$ and $6\times \text{6}~\text{mm}^{2}$ OCTA images) and code are publicly available.

AAAI Conference 2024 Conference Paper

Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition

  • Jianyang Xie
  • Yanda Meng
  • Yitian Zhao
  • Anh Nguyen
  • Xiaoyun Yang
  • Yalin Zheng

Graph convolutional networks (GCNs) have attracted great attention and achieved remarkable performance in skeleton-based action recognition. However, most of the previous works are designed to refine skeleton topology without considering the types of different joints and edges, making them infeasible to represent the semantic information. In this paper, we proposed a dynamic semantic-based graph convolution network (DS-GCN) for skeleton-based human action recognition, where the joints and edge types were encoded in the skeleton topology in an implicit way. Specifically, two semantic modules, the joints type-aware adaptive topology and the edge type-aware adaptive topology, were proposed. Combining proposed semantics modules with temporal convolution, a powerful framework named DS-GCN was developed for skeleton-based action recognition. Extensive experiments in two datasets, NTU-RGB+D and Kinetics-400 show that the proposed semantic modules were generalized enough to be utilized in various backbones for boosting recognition accuracy. Meanwhile, the proposed DS-GCN notably outperformed state-of-the-art methods. The code is released here https://github.com/davelailai/DS-GCN

IJCAI Conference 2024 Conference Paper

Regression Residual Reasoning with Pseudo-labeled Contrastive Learning for Uncovering Multiple Complex Compositional Relations

  • Chengtai Li
  • Yuting He
  • Jianfeng Ren
  • Ruibin Bai
  • Yitian Zhao
  • Heng Yu
  • Xudong Jiang

Abstract Visual Reasoning (AVR) has been widely studied in literature. Our study reveals that AVR models tend to rely on appearance matching rather than a genuine understanding of underlying rules. We hence develop a challenging benchmark, Multiple Complex Compositional Reasoning (MC2R), composed of diverse compositional rules on attributes with intentionally increased variations. It aims to identify two outliers from five given images, in contrast to single-answer questions in previous AVR tasks. To solve MC2R tasks, a Regression Residual Reasoning with Pseudo-labeled Contrastive Learning (R3PCL) is proposed, which first transforms the original problem by selecting three images following the same rule, and iteratively regresses one normal image by using the other two, allowing the model to gradually comprehend the underlying rules. The proposed PCL leverages a set of min-max operations to generate more reliable pseudo labels, and exploits contrastive learning with data augmentation on pseudo-labeled images to boost the discrimination and generalization of features. Experimental results on two AVR datasets show that the proposed R3PCL significantly outperforms state-of-the-art models.

AAAI Conference 2024 Conference Paper

Scale Optimization Using Evolutionary Reinforcement Learning for Object Detection on Drone Imagery

  • Jialu Zhang
  • Xiaoying Yang
  • Wentao He
  • Jianfeng Ren
  • Qian Zhang
  • Yitian Zhao
  • Ruibin Bai
  • Xiangjian He

Object detection in aerial imagery presents a significant challenge due to large scale variations among objects. This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection framework, to optimize the scale for more effective detection of objects in such images. Specifically, a set of patches potentially containing objects are first generated. A set of rewards measuring the localization accuracy, the accuracy of predicted labels, and the scale consistency among nearby patches are designed in the agent to guide the scale optimization. The proposed scale-consistency reward ensures similar scales for neighboring objects of the same category. Furthermore, a spatial-semantic attention mechanism is designed to exploit the spatial semantic relations between patches. The agent employs the proximal policy optimization strategy in conjunction with the evolutionary strategy, effectively utilizing both the current patch status and historical experience embedded in the agent. The proposed model is compared with state-of-the-art methods on two benchmark datasets for object detection on drone imagery. It significantly outperforms all the compared methods. Code is available at https://github.com/UNNC-CV/EvOD/.

JBHI Journal 2022 Journal Article

Explainable Diabetic Retinopathy Detection and Retinal Image Generation

  • Yuhao Niu
  • Lin Gu
  • Yitian Zhao
  • Feng Lu

Though deep learning has shown successful performance in classifying the label and severity stage of certain diseases, most of them give few explanations on how to make predictions. Inspired by Koch's Postulates, the foundation in evidence-based medicine (EBM) to identify the pathogen, we propose to exploit the interpretability of deep learning application in medical diagnosis. By isolating neuron activation patterns from a diabetic retinopathy (DR) detector and visualizing them, we can determine the symptoms that the DR detector identifies as evidence to make prediction. To be specific, we first define novel pathological descriptors using activated neurons of the DR detector to encode both spatial and appearance information of lesions. Then, to visualize the symptom encoded in the descriptor, we propose Patho-GAN, a new network to synthesize medically plausible retinal images. By manipulating these descriptors, we could even arbitrarily control the position, quantity, and categories of generated lesions. We also show that our synthesized images carry the symptoms directly related to diabetic retinopathy diagnosis. Our generated images are both qualitatively and quantitatively superior to the ones by previous methods. Besides, compared to existing methods that take hours to generate an image, our second level speed endows the potential to be an effective solution for data augmentation.

JBHI Journal 2022 Journal Article

Multi-Scale Interactive Network With Artery/Vein Discriminator for Retinal Vessel Classification

  • Jingfei Hu
  • Hua Wang
  • Guang Wu
  • Zhaohui Cao
  • Lei Mou
  • Yitian Zhao
  • Jicong Zhang

Automatic classification of retinal arteries and veins plays an important role in assisting clinicians to diagnosis cardiovascular and eye-related diseases. However, due to the high degree of anatomical variation across the population, and the presence of inconsistent labels by the subjective judgment of annotators in available training data, most of existing methods generally suffer from blood vessel discontinuity and arteriovenous confusion, the artery/vein (A/V) classification task still faces great challenges. In this work, we propose a multi-scale interactive network with A/V discriminator for retinal artery and vein recognition, which can reduce the arteriovenous confusion and alleviate the disturbance of noisy label. A multi-scale interaction (MI) module is designed in encoder for realizing the cross-space multi-scale features interaction of fundus images, effectively integrate high-level and low-level context information. In particular, we also design an ingenious A/V discriminator (AVD) that utilizes the independent and shared information between arteries and veins, and combine with topology loss, to further strengthen the learning ability of model to resolve the arteriovenous confusion. In addition, we adopt a sample re-weighting (SW) strategy to effectively alleviate the disturbance from data labeling errors. The proposed model is verified on three publicly available fundus image datasets (AV-DRIVE, HRF, LES-AV) and a private dataset. We achieve the accuracy of 97. 47%, 96. 91%, 97. 79%, and 98. 18% respectively on these four datasets. Extensive experimental results demonstrate that our method achieves competitive performance compared with state-of-the-art methods for A/V classification. To address the problem of training data scarcity, we publicly release 100 fundus images with A/V annotations to promote relevant research in the community.

JBHI Journal 2022 Journal Article

Sparse-Based Domain Adaptation Network for OCTA Image Super-Resolution Reconstruction

  • Huaying Hao
  • Cong Xu
  • Dan Zhang
  • Qifeng Yan
  • Jiong Zhang
  • Yue Liu
  • Yitian Zhao

Retinal Optical Coherence Tomography Angiography (OCTA) with high-resolution is important for the quantification and analysis of retinal vasculature. However, the resolution of OCTA images is inversely proportional to the field of view at the same sampling frequency, which is not conducive to clinicians for analyzing larger vascular areas. In this paper, we propose a novel S parse-based domain A daptation S uper- R esolution network (SASR) for the reconstruction of realistic $6\times \text{6}{\rm{mm}}^{2}$ /low-resolution (LR) OCTA images to high-resolution (HR) representations. To be more specific, we first perform a simple degradation of the $3\times \text{3}\, {\rm{mm}}^{2}$ /high-resolution (HR) image to obtain the synthetic LR image. An efficient registration method is then employed to register the synthetic LR with its corresponding $3\times \text{3}\, {\rm{mm}}^{2}$ image region within the $6\times \text{6}\, {\rm{mm}}^{2}$ image to obtain the cropped realistic LR image. We then propose a multi-level super-resolution model for the fully-supervised reconstruction of the synthetic data, guiding the reconstruction of the realistic LR images through a generative-adversarial strategy that allows the synthetic and realistic LR images to be unified in the feature domain. Finally, a novel sparse edge-aware loss is designed to dynamically optimize the vessel edge structure. Extensive experiments on two OCTA sets have shown that our method performs better than state-of-the-art super-resolution reconstruction methods. In addition, we have investigated the performance of the reconstruction results on retina structure segmentations, which further validate the effectiveness of our approach.

AIIM Journal 2021 Journal Article

A Decision Tree-Initialised Neuro-fuzzy Approach for Clinical Decision Support

  • Tianhua Chen
  • Changjing Shang
  • Pan Su
  • Elpida Keravnou-Papailiou
  • Yitian Zhao
  • Grigoris Antoniou
  • Qiang Shen

Apart from the need for superior accuracy, healthcare applications of intelligent systems also demand the deployment of interpretable machine learning models which allow clinicians to interrogate and validate extracted medical knowledge. Fuzzy rule-based models are generally considered interpretable that are able to reflect the associations between medical conditions and associated symptoms, through the use of linguistic if-then statements. Systems built on top of fuzzy sets are of particular appealing to medical applications since they enable the tolerance of vague and imprecise concepts that are often embedded in medical entities such as symptom description and test results. They facilitate an approximate reasoning framework which mimics human reasoning and supports the linguistic delivery of medical expertise often expressed in statements such as ‘weight low’ or ‘glucose level high’ while describing symptoms. This paper proposes an approach by performing data-driven learning of accurate and interpretable fuzzy rule bases for clinical decision support. The approach starts with the generation of a crisp rule base through a decision tree learning mechanism, capable of capturing simple rule structures. The crisp rule base is then transformed into a fuzzy rule base, which forms the input to the framework of adaptive network-based fuzzy inference system (ANFIS), thereby further optimising the parameters of both rule antecedents and consequents. Experimental studies on popular medical data benchmarks demonstrate that the proposed work is able to learn compact rule bases involving simple rule antecedents, with statistically better or comparable performance to those achieved by state-of-the-art fuzzy classifiers.

JBHI Journal 2020 Journal Article

Automatic Segmentation and Visualization of Choroid in OCT with Knowledge Infused Deep Learning

  • Huihong Zhang
  • Jianlong Yang
  • Kang Zhou
  • Fei Li
  • Yan Hu
  • Yitian Zhao
  • Ce Zheng
  • Xiulan Zhang

The choroid provides oxygen and nourishment to the outer retina thus is related to the pathology of various ocular diseases. Optical coherence tomography (OCT) is advantageous in visualizing and quantifying the choroid in vivo. However, its application in the study of the choroid is still limited for two reasons. (1) The lower boundary of the choroid (choroid-sclera interface) in OCT is fuzzy, which makes the automatic segmentation difficult and inaccurate. (2) The visualization of the choroid is hindered by the vessel shadows from the superficial layers of the inner retina. In this paper, we propose to incorporate medical and imaging prior knowledge with deep learning to address these two problems. We propose a biomarker-infused global-to-local network (Bio-Net) for the choroid segmentation, which not only regularizes the segmentation via predicted choroid thickness, but also leverages a global-to-local segmentation strategy to provide global structure information and suppress overfitting. For eliminating the retinal vessel shadows, we propose a deep-learning pipeline, which firstly locate the shadows using their projection on the retinal pigment epithelium layer, then the contents of the choroidal vasculature at the shadow locations are predicted with an edge-to-texture generative adversarial inpainting network. The results show our method outperforms the existing methods on both tasks. We further apply the proposed method in a clinical prospective study for understanding the pathology of glaucoma, which demonstrates its capacity in detecting the structure and vascular changes of the choroid related to the elevation of intra-ocular pressure.

JBHI Journal 2020 Journal Article

Introducing the GEV Activation Function for Highly Unbalanced Data to Develop COVID-19 Diagnostic Models

  • Joshua Bridge
  • Yanda Meng
  • Yitian Zhao
  • Yong Du
  • Mingfeng Zhao
  • Renrong Sun
  • Yalin Zheng

Fast and accurate diagnosis is essential for the efficient and effective control of the COVID-19 pandemic that is currently disrupting the whole world. Despite the prevalence of the COVID-19 outbreak, relatively few diagnostic images are openly available to develop automatic diagnosis algorithms. Traditional deep learning methods often struggle when data is highly unbalanced with many cases in one class and only a few cases in another; new methods must be developed to overcome this challenge. We propose a novel activation function based on the generalized extreme value (GEV) distribution from extreme value theory, which improves performance over the traditional sigmoid activation function when one class significantly outweighs the other. We demonstrate the proposed activation function on a publicly available dataset and externally validate on a dataset consisting of 1, 909 healthy chest X-rays and 84 COVID-19 X-rays. The proposed method achieves an improved area under the receiver operating characteristic (DeLong's p-value < 0. 05) compared to the sigmoid activation. Our method is also demonstrated on a dataset of healthy and pneumonia vs. COVID-19 X-rays and a set of computerized tomography images, achieving improved sensitivity. The proposed GEV activation function significantly improves upon the previously used sigmoid activation for binary classification. This new paradigm is expected to play a significant role in the fight against COVID-19 and other diseases, with relatively few training cases available.