Author name cluster

Lequan Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers

2 author rows

AAAI Conference 2026 Conference Paper

FDP: A Frequency-Decomposition Preprocessing Pipeline for Unsupervised Anomaly Detection in Brain MRI

Hao Li
Zhenfeng Zhuang
Jingyu Lin
Yu Liu
Yifei Chen
Qiong Peng
Lequan Yu
Liansheng Wang

Due to the diversity of brain anatomy and the scarcity of annotated data, supervised anomaly detection for brain MRI remains challenging, driving the development of unsupervised anomaly detection (UAD) approaches. Current UAD methods typically utilize synthetically generated noise perturbations on healthy MRIs to train generative models for normal anatomy reconstruction, enabling anomaly detection via residual maps. However, such simulated anomalies lack the biophysical fidelity and morphological complexity characteristic of true clinical lesions. To advance UAD in brain MRI, we conduct the first systematic frequency-domain analysis of pathological signatures, revealing two key properties: (1) anomalies exhibit unique frequency patterns distinguishable from normal anatomy, and (2) low-frequency signals maintain consistent representations across healthy scans. These insights motivate our Frequency-Decomposition Preprocessing (FDP) framework—the first UAD method to leverage frequency-domain reconstruction for simultaneous pathology suppression and anatomical preservation. FDP can integrate seamlessly with existing anomaly simulation techniques, consistently enhancing detection performance across diverse architectures while maintaining diagnostic fidelity. Experimental results demonstrate that FDP consistently improves anomaly detection performance when integrated with existing methods. Notably, FDP achieves a 17.63% increase in DICE score with LDM while maintaining robust improvements across multiple baselines.

PDF Details DOI

AAAI Conference 2026 Conference Paper

PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning

Yushi Feng
Junye Du
Yingying Hong
Qifan Wang
Lequan Yu

Existing tool-augmented agentic systems are limited in the real world by (i) black-box reasoning steps that undermine trust of decision-making and pose safety risks, (ii) poor multimodal integration, which is inherently critical for healthcare tasks, and (iii) rigid and computationally inefficient agentic pipelines. We introduce PASS (Probabilistic Agentic Supernet Sampling), the first multimodal framework to address these challenges for Chest X-Ray (CXR) reasoning. PASS adaptively samples agentic workflows over a multi-tool graph, yielding decision paths annotated with interpretable probabilities. Given the complex CXR reasoning task with multimodal medical data, PASS leverages its learned task-conditioned distribution over the agentic supernet. Thus, it adaptively selects the most suitable tool at each supernet layer, offering probability-annotated trajectories for post-hoc audits and directly enhancing medical AI safety. PASS also continuously compresses salient findings into an evolving personalized memory, while dynamically deciding whether to deepen its reasoning path or invoke an early exit for efficiency. To optimize a Pareto frontier balancing performance and cost, we design a novel three-stage training procedure, including expert knowledge warm-up, contrastive path-ranking, and cost-aware reinforcement learning. To facilitate rigorous evaluation, we introduce CAB-E, a comprehensive benchmark for multi-step, safety-critical, free-form reasoning. Experiments across various benchmarks validate that PASS significantly outperforms strong baselines in multiple metrics (e.g., accuracy, LLM-Judge, semantic similarity, etc.) while balancing computational costs, pushing a new paradigm shift towards interpretable, adaptive, and multimodal medical agentic systems.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Tsai Hor Chan
Feng Wu
Yihang Chen
Guosheng Yin
Lequan Yu

Developing effective multimodal fusion approaches has become increasingly essential in many real-world scenarios, such as health care and finance. The key challenge is how to preserve the feature expressiveness in each modality while learning cross-modal interactions. Previous approaches primarily focus on the cross-modal alignment, while over-emphasis on the alignment of marginal distributions of modalities may impose excess regularization and obstruct meaningful representations within each modality. The Dirichlet process (DP) mixture model is a powerful Bayesian non-parametric method that can amplify the most prominent features by its richer-gets-richer property, which allocates increasing weights to them. Inspired by this unique characteristic of DP, we propose a new DP-driven multimodal learning framework that automatically achieves an optimal balance between prominent intra-modal representation learning and cross-modal alignment. Specifically, we assume that each modality follows a mixture of multivariate Gaussian distributions and further adopt DP to calculate the mixture weights for all the components. This paradigm allows DP to dynamically allocate the contributions of features and select the most prominent ones, leveraging its richer-gets-richer property, thus facilitating multimodal feature fusion. Extensive experiments on several multimodal datasets demonstrate the superior performance of our model over other competitors. Ablation analysis further validates the effectiveness of DP in aligning modality distributions and its robustness to changes in key hyperparameters. Code is anonymously available at https: //github. com/HKU-MedAI/DPMM. git

PDF Details

JBHI Journal 2025 Journal Article

Completed Feature Disentanglement Learning for Multimodal MRIs Analysis

Tianling Liu
Hongying Liu
Fanhua Shang
Lequan Yu
Tong Han
Liang Wan

Multimodal MRIs play a crucial role in clinical diagnosis and treatment. Feature disentanglement (FD)-based methods, aiming at learning superior feature representations for multimodal data analysis, have achieved significant success in multimodal learning (MML). Typically, existing FD-based methods separate multimodal data into modality-shared and modality-specific features, and employ concatenation or attention mechanisms to integrate these features. However, our preliminary experiments indicate that these methods could lead to a loss of shared information among subsets of modalities when the inputs contain more than two modalities, and such information is critical for prediction accuracy. Furthermore, these methods do not adequately interpret the relationships between the decoupled features at the fusion stage. To address these limitations, we propose a novel Complete Feature Disentanglement (CFD) strategy that recovers the lost information during feature decoupling. Specifically, the CFD strategy not only identifies modality-shared and modality-specific features, but also decouples shared features among subsets of multimodal inputs, termed as modality-partial-shared features. We further introduce a new Dynamic Mixture-of-Experts Fusion (DMF) module that dynamically integrates these decoupled features, by explicitly learning the local-global relationships among the features. The effectiveness of our approach is validated through classification tasks on three multimodal MRI datasets. Extensive experimental results demonstrate that our approach outperforms other state-of-the-art MML methods with obvious margins, showcasing its superior performance.

Details DOI

ICML Conference 2025 Conference Paper

Cross-Modal Alignment via Variational Copula Modelling

Feng Wu
Tsai Hor Chan
Fuying Wang
Guosheng Yin
Lequan Yu

Various data modalities are common in real-world applications. (e. g. , EHR, medical images and clinical notes in healthcare). Thus, it is essential to develop multimodal learning methods to aggregate information from multiple modalities. The main challenge is appropriately aligning and fusing the representations of different modalities into a joint distribution. Existing methods mainly rely on concatenation or the Kronecker product, oversimplifying interactions structure between modalities and indicating a need to model more complex interactions. Additionally, the joint distribution of latent representations with higher-order interactions is underexplored. Copula is a powerful statistical structure in modelling the interactions between variables, as it bridges the joint distribution and marginal distributions of multiple variables. In this paper, we propose a novel copula modelling-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities to capture the complex interaction among them. The key idea is interpreting the copula model as a tool to align the marginal distributions of the modalities efficiently. By assuming a Gaussian mixture distribution for each modality and a copula model on the joint distribution, our model can also generate accurate representations for missing modalities. Extensive experiments on public MIMIC datasets demonstrate the superior performance of our model over other competitors. The code is anonymously available at https: //github. com/HKU-MedAI/CMCM.

Details

IJCAI Conference 2025 Conference Paper

DERI: Cross-Modal ECG Representation Learning with Deep ECG-Report Interaction

Jian Chen
Xiaoru Dong
Wei Wang
Shaorui Zhou
Lequan Yu
Xiping Hu

Electrocardiogram (ECG) is widely used to diagnose cardiac conditions via deep learning methods. Although existing self-supervised learning (SSL) methods have achieved great performance in learning representation for ECG-based cardiac conditions classification, the clinical semantics can not be effectively captured. To overcome this limitation, we proposed to learn cross-modal ECG representations that contain more clinical semantics via a novel framework with \textbf{D}eep \textbf{E}CG-\textbf{R}eport \textbf{I}nteraction (\textbf{DERI}). Specifically, we design a novel framework combining multiple alignments and mutual feature reconstructions to learn effective representation of the ECG with the clinical report, which fuses the clinical semantics of the report. An RME-Module inspired by masked modeling is proposed to improve the ECG representation learning. Furthermore, we extend ECG representation learning to report generation with a language model, which is significant for evaluating clinical semantics in the learned representations and even clinical applications. Comprehensive experiments with various settings are conducted on various datasets to show the superior performance of our DERI. Our code is released on https: //github. com/cccccj-03/DERI.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Dynamic Entity-Masked Graph Diffusion Model for Histopathology Image Representation Learning

Zhenfeng Zhuang
Min Cen
Yanfeng Li
Fangyu Zhou
Lequan Yu
Baptiste Magnier
Liansheng Wang

Significant disparities between the features of natural images and those inherent to histopathological images make it challenging to directly apply and transfer pre-trained models from natural images to histopathology tasks. Moreover, the frequent lack of annotations in histopathology patch images has driven researchers to explore self-supervised learning methods like mask reconstruction for learning representations from large amounts of unlabeled data. Crucially, previous mask-based efforts in self-supervised learning have often overlooked the spatial interactions among entities, which are essential for constructing accurate representations of pathological entities. To address these challenges, constructing graphs of entities is a promising approach. In addition, the diffusion reconstruction strategy has recently shown superior performance through its random intensity noise addition technique to enhance the robust learned representation. Therefore, we introduce H-MGDM, a novel self-supervised Histopathology image representation learning method through the Dynamic Entity-Masked Graph Diffusion Model. Specifically, we propose to use complementary subgraphs as latent diffusion conditions and self-supervised targets respectively during pre-training. We note that the graph can embed entities' topological relationships and enhance representation. Dynamic conditions and targets can improve pathological fine reconstruction. Our model has conducted pretraining experiments on three large histopathological datasets. The advanced predictive performance and interpretability of H-MGDM are clearly evaluated on comprehensive downstream tasks such as classification and survival analysis on six datasets.

PDF Details DOI

ICLR Conference 2025 Conference Paper

From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics

Qinshuo Liu
Weiqin Zhao
Wei Huang
Yanwen Fang
Lequan Yu
Guodong Li

The depth of neural networks is a critical factor for their capability, with deeper models often demonstrating superior performance. Motivated by this, significant efforts have been made to enhance layer aggregation - reusing information from previous layers to better extract features at the current layer, to improve the representational power of deep neural networks. However, previous works have primarily addressed this problem from a discrete-state perspective which is not suitable as the number of network layers grows. This paper novelly treats the outputs from layers as states of a continuous process and considers leveraging the state space model (SSM) to design the aggregation of layers in very deep neural networks. Moreover, inspired by its advancements in modeling long sequences, the Selective State Space Models (S6) is employed to design a new module called Selective State Space Model Layer Aggregation (S6LA). This module aims to combine traditional CNN or transformer architectures within a sequential framework, enhancing the representational capabilities of state-of-the-art vision networks. Extensive experiments show that S6LA delivers substantial improvements in both image classification and detection tasks, highlighting the potential of integrating SSMs with contemporary deep learning techniques.

Details

ICML Conference 2025 Conference Paper

From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining

Fuying Wang
Jiacheng Xu
Lequan Yu

Electrocardiograms (ECGs) play a vital role in monitoring cardiac health and diagnosing heart diseases. However, traditional deep learning approaches for ECG analysis rely heavily on large-scale manual annotations, which are both time-consuming and resource-intensive to obtain. To overcome this limitation, self-supervised learning (SSL) has emerged as a promising alternative, enabling the extraction of robust ECG representations that can be efficiently transferred to various downstream tasks. While previous studies have explored SSL for ECG pretraining and multi-modal ECG-language alignment, they often fail to capture the multi-scale nature of ECG signals. As a result, these methods struggle to learn generalized representations due to their inability to model the hierarchical structure of ECG data. To address this gap, we introduce MELP, a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs. MELP first pretrains a cardiology-specific language model to enhance its understanding of clinical text. It then applies three levels of cross-modal supervision—at the token, beat, and rhythm levels—to align ECG signals with textual reports, capturing structured information across different time scales. We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning. Experimental results demonstrate that MELP outperforms existing SSL methods, underscoring its effectiveness and adaptability across diverse clinical applications. Our code is available at https: //github. com/HKU-MedAI/MELP.

Details

AAAI Conference 2025 Conference Paper

Large Images Are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting

Lingting Zhu
Guying Lin
Jinnan Chen
Xinjie Zhang
Zhenchao Jin
Zhao Wang
Lequan Yu

While Implicit Neural Representations (INRs) have demonstrated significant success in image representation, they are often hindered by large training memory and slow decoding speed. Recently, Gaussian Splatting (GS) has emerged as a promising solution in 3D reconstruction due to its highquality novel view synthesis and rapid rendering capabilities, positioning it as a valuable tool for a broad spectrum of applications. In particular, a GS-based representation, 2DGS, has shown potential for image fitting. In our work, we present Large Images are Gaussians (LIG), which delves deeper into the application of 2DGS for image representations, addressing the challenge of fitting large images with 2DGS in the situation of numerous Gaussian points, through two distinct modifications: 1) we adopt a variant of representation and optimization strategy, facilitating the fitting of a large number of Gaussian points; 2) we propose a Level-of-Gaussian approach for reconstructing both coarse low-frequency initialization and fine high-frequency details. Consequently, we successfully represent large images as Gaussian points and achieve high-quality large image representation, demonstrating its efficacy across various types of large images.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks

Yinghao Zhu
Ziyi He
Haoran Hu
Xiaochen Zheng
Xichen Zhang
Wang Wang
Junyi Gao
Liantao Ma

The rapid advancement of Large Language Models (LLMs) has stimulated interest in multi-agent collaboration for addressing complex medical tasks. However, the practical advantages of multi-agent collaboration approaches remain insufficiently understood. Existing evaluations often lack generalizability, failing to cover diverse tasks reflective of real-world clinical practice, and frequently omit rigorous comparisons against both single-LLM-based and established conventional methods. To address this critical gap, we introduce MedAgentBoard, a comprehensive benchmark for the systematic evaluation of multi-agent collaboration, single-LLM, and conventional approaches. MedAgentBoard encompasses four diverse medical task categories: (1) medical (visual) question answering, (2) lay summary generation, (3) structured Electronic Health Record (EHR) predictive modeling, and (4) clinical workflow automation, across text, medical images, and structured EHR data. Our extensive experiments reveal a nuanced landscape: while multi-agent collaboration demonstrates benefits in specific scenarios, such as enhancing task completeness in clinical workflow automation, it does not consistently outperform advanced single LLMs (e. g. , in textual medical QA) or, critically, specialized conventional methods that generally maintain better performance in tasks like medical VQA and EHR-based prediction. MedAgentBoard offers a vital resource and actionable insights, emphasizing the necessity of a task-specific, evidence-based approach to selecting and developing AI solutions in medicine. It underscores that the inherent complexity and overhead of multi-agent collaboration must be carefully weighed against tangible performance gains. All code, datasets, detailed prompts, and experimental results are open-sourced at this link.

PDF Details

NeurIPS Conference 2025 Conference Paper

Variational Pólya Tree

Lu Xu
Tsai Hor Chan
Lequan Yu
Kwok Lam
Guosheng Yin

Density estimation is essential for generative modeling, particularly with the rise of modern neural networks. While existing methods capture complex data distributions, they often lack interpretability and uncertainty quantification. Bayesian nonparametric methods, especially the Pólya tree, offer a robust framework that addresses these issues by accurately capturing function behavior over small intervals. Traditional techniques like Markov chain Monte Carlo (MCMC) face high computational complexity and scalability limitations, hindering the use of Bayesian nonparametric methods in deep learning. To tackle this, we introduce the variational Pólya tree (VPT) model, which employs stochastic variational inference to compute posterior distributions. This model provides a flexible, nonparametric Bayesian prior that captures latent densities and works well with stochastic gradient optimization. We also leverage the joint distribution likelihood for a more precise variational posterior approximation than traditional mean-field methods. We evaluate the model performance on both real data and images, and demonstrate its competitiveness with other state-of-the-art deep density estimation methods. We also explore its ability in enhancing interpretability and uncertainty quantification. Code is available at https: //github. com/howardchanth/var-polya-tree.

PDF Details

AAAI Conference 2024 Conference Paper

Boosting Multiple Instance Learning Models for Whole Slide Image Classification: A Model-Agnostic Framework Based on Counterfactual Inference

Weiping Lin
Zhenfeng Zhuang
Lequan Yu
Liansheng Wang

Multiple instance learning is an effective paradigm for whole slide image (WSI) classification, where labels are only provided at the bag level. However, instance-level prediction is also crucial as it offers insights into fine-grained regions of interest. Existing multiple instance learning methods either solely focus on training a bag classifier or have the insufficient capability of exploring instance prediction. In this work, we propose a novel model-agnostic framework to boost existing multiple instance learning models, to improve the WSI classification performance in both bag and instance levels. Specifically, we propose a counterfactual inference-based sub-bag assessment method and a hierarchical instance searching strategy to help to search reliable instances and obtain their accurate pseudo labels. Furthermore, an instance classifier is well-trained to produce accurate predictions. The instance embedding it generates is treated as a prompt to refine the instance feature for bag prediction. This framework is model-agnostic, capable of adapting to existing multiple instance learning models, including those without specific mechanisms like attention. Extensive experiments on three datasets demonstrate the competitive performance of our method. Code will be available at https://github.com/centurion-crawler/CIMIL.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement

Yanyan Huang
Weiqin Zhao
Yihang Chen
Yu Fu
Lequan Yu

Whole slide image (WSI) analysis is gaining prominence within the medical imaging field. Recent advances in pathology foundation models have shown the potential to extract powerful feature representations from WSIs for downstream tasks. However, these foundation models are usually designed for general-purpose pathology image analysis and may not be optimal for specific downstream tasks or cancer types. In this work, we present Concept Anchor-guided Task-specific Feature Enhancement (CATE), an adaptable paradigm that can boost the expressivity and discriminativeness of pathology foundation models for specific downstream tasks. Based on a set of task-specific concepts derived from the pathology vision-language model with expert-designed prompts, we introduce two interconnected modules to dynamically calibrate the generic image features extracted by foundation models for certain tasks or cancer types. Specifically, we design a Concept-guided Information Bottleneck module to enhance task-relevant characteristics by maximizing the mutual information between image features and concept anchors while suppressing superfluous information. Moreover, a Concept-Feature Interference module is proposed to utilize the similarity between calibrated features and concept anchors to further generate discriminative task-specific features. The extensive experiments on public WSI datasets demonstrate that CATE significantly enhances the performance and generalizability of MIL models. Additionally, heatmap and umap visualization results also reveal the effectiveness and interpretability of CATE.

PDF Details DOI

JBHI Journal 2024 Journal Article

Hybrid Masked Image Modeling for 3D Medical Image Segmentation

Zhaohu Xing
Lei Zhu
Lequan Yu
Zhiheng Xing
Liang Wan

Masked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique. The existing MIM methods adopt the strategy to mask random patches of the image and reconstruct the missing pixels, which only considers semantic information at a lower level, and causes a long pre-training time. This paper presents HybridMIM, a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation. Specifically, we design a two-level masking hierarchy to specify which and how patches in sub-volumes are masked, effectively providing the constraints of higher level semantic information. Then we learn the semantic information of medical images at three levels, including: 1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden (pixel-level); 2) patch-masking perception to learn the spatial relationship between the patches in each sub-volume (region-level); and 3) drop-out-based contrastive learning between samples within a mini-batch, which further improves the generalization ability of the framework (sample-level). The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation. We conduct comprehensive experiments on five widely-used public medical image segmentation datasets, including BraTS2020, BTCV, MSD Liver, MSD Spleen, and BraTS2023. The experimental results show the clear superiority of HybridMIM against competing supervised methods, masked pre-training approaches, and other self-supervised methods, in terms of quantitative metrics, speed performance and qualitative observations.

Details DOI

AAAI Conference 2024 Conference Paper

Memory-Efficient Prompt Tuning for Incremental Histopathology Classification

Yu Zhu
Kang Li
Lequan Yu
Pheng Ann Heng

Recent studies have made remarkable progress in histopathology classification. Based on current successes, contemporary works proposed to further upgrade the model towards a more generalizable and robust direction through incrementally learning from the sequentially delivered domains. Unlike previous parameter isolation based approaches that usually demand massive computation resources during model updating, we present a memory-efficient prompt tuning framework to cultivate model generalization potential in economical memory cost. For each incoming domain, we reuse the existing parameters of the initial classification model and attach lightweight trainable prompts into it for customized tuning. Considering the domain heterogeneity, we perform decoupled prompt tuning, where we adopt a domain-specific prompt for each domain to independently investigate its distinctive characteristics, and one domain-invariant prompt shared across all domains to continually explore the common content embedding throughout time. All domain-specific prompts will be appended to the prompt bank and isolated from further changes to prevent forgetting the distinctive features of early-seen domains. While the domain-invariant prompt will be passed on and iteratively evolve by style-augmented prompt refining to improve model generalization capability over time. In specific, we construct a graph with existing prompts and build a style-augmented graph attention network to guide the domain-invariant prompt exploring the overlapped latent embedding among all delivered domains for more domain-generic representations. We have extensively evaluated our framework with two histopathology tasks, i.e., breast cancer metastasis classification and epithelium-stroma tissue classification, where our approach yielded superior performance and memory efficiency over the competing methods.

PDF Details DOI

TMLR Journal 2024 Journal Article

Single-Shot Plug-and-Play Methods for Inverse Problems

Yanqi Cheng
Lipei Zhang
Zhenda Shen
Shujun Wang
Lequan Yu
Raymond H. Chan
Carola-Bibiane Schönlieb
Angelica I Aviles-Rivero

The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on pre-trained denoisers using large datasets. In this work, we introduce Single-Shot PnP methods (SS-PnP), shifting the focus to solving inverse problems with minimal data. First, we integrate Single-Shot proximal denoisers into iterative methods, enabling training with single instances. Second, we propose implicit neural priors based on a novel function that preserves relevant frequencies to capture fine details while avoiding the issue of vanishing gradients. We demonstrate, through extensive numerical and visual experiments, that our method leads to better approximations.

PDF Details

JBHI Journal 2024 Journal Article

Synthesizing Feature-Aligned and Category-Aware Electronic Medical Records for Intracranial Aneurysm Rupture Prediction

Qian Yang
Caizi Li
Chubin Ou
Kang Li
Xiangyun Liao
Chuanzhi Duan
Lequan Yu
Weixin Si

Rupture prediction is crucial for precise treatment and follow-up management of patients with intracranial aneurysms (IAs). Considerable machine learning (ML) methods have been proposed to improve rupture prediction by leveraging electronic medical records (EMRs), however, data scarcity and category imbalance strongly influence performance. Thus, we propose a novel data synthesis method i. e. , Transformer-based conditional GAN (TransCGAN), to synthesize highly authentic and category-aware EMRs to address above challenges. Specifically, we first align feature-wise context relationship and distribution between synthetic and original data to enhance synthetic data quality. To achieve this, we first integrate the Transformer structure into GAN to match the contextual relationship by processing the long-range dependencies among clinical factors and introduce a statistical loss to maintain distributional consistency by constraining the mean and variance of the synthesis features. Additionally, a conditional module is designed to assign the category of the synthesis data, thereby addressing the challenge of category imbalance. Subsequently, the synthetic data are merged with the original data to form a large-scale and category-balanced training dataset for IAs rupture prediction. Experimental results show that using TransCGAN's synthetic data enhances classifier performance, achieving AUC of 0. 89 and outperforming state-of-the-art resampling methods by 5 $\%$ –33 $\%$ in F1 score.

Details DOI

NeurIPS Conference 2023 Conference Paper

Adaptive Uncertainty Estimation via High-Dimensional Testing on Latent Representations

Tsai Hor Chan
Kin Wai Lau
Jiajun Shen
Guosheng Yin
Lequan Yu

Uncertainty estimation aims to evaluate the confidence of a trained deep neural network. However, existing uncertainty estimation approaches rely on low-dimensional distributional assumptions and thus suffer from the high dimensionality of latent features. Existing approaches tend to focus on uncertainty on discrete classification probabilities, which leads to poor generalizability to uncertainty estimation for other tasks. Moreover, most of the literature requires seeing the out-of-distribution (OOD) data in the training for better estimation of uncertainty, which limits the uncertainty estimation performance in practice because the OOD data are typically unseen. To overcome these limitations, we propose a new framework using data-adaptive high-dimensional hypothesis testing for uncertainty estimation, which leverages the statistical properties of the feature representations. Our method directly operates on latent representations and thus does not require retraining the feature encoder under a modified objective. The test statistic relaxes the feature distribution assumptions to high dimensionality, and it is more discriminative to uncertainties in the latent representations. We demonstrate that encoding features with Bayesian neural networks can enhance testing performance and lead to more accurate uncertainty estimation. We further introduce a family-wise testing procedure to determine the optimal threshold of OOD detection, which minimizes the false discovery rate (FDR). Extensive experiments validate the satisfactory performance of our framework on uncertainty estimation and task-specific prediction over a variety of competitors. The experiments on the OOD detection task also show satisfactory performance of our method when the OOD data are unseen in the training. Codes are available at https: //github. com/HKU-MedAI/bnn_uncertainty.

PDF Details

JBHI Journal 2023 Journal Article

ARR-GCN: Anatomy-Relation Reasoning Graph Convolutional Network for Automatic Fine-Grained Segmentation of Organ's Surgical Anatomy

Yinli Tian
Wenjian Qin
Fei Xue
Ricardo Lambo
Meiyan Yue
Songhui Diao
Lequan Yu
Yaoqin Xie

Anatomical resection (AR) based on anatomical sub-regions is a promising method of precise surgical resection, which has been proven to improve long-term survival by reducing local recurrence. The fine-grained segmentation of an organ's surgical anatomy (FGS-OSA), i. e. , segmenting an organ into multiple anatomic regions, is critical for localizing tumors in AR surgical planning. However, automatically obtaining FGS-OSA results in computer-aided methods faces the challenges of appearance ambiguities among sub-regions (i. e. , inter-sub-region appearance ambiguities) caused by similar HU distributions in different sub-regions of an organ's surgical anatomy, invisible boundaries, and similarities between anatomical landmarks and other anatomical information. In this paper, we propose a novel fine-grained segmentation framework termed the “anatomic relation reasoning graph convolutional network” (ARR-GCN), which incorporates prior anatomic relations into the framework learning. In ARR-GCN, a graph is constructed based on the sub-regions to model the class and their relations. Further, to obtain discriminative initial node representations of graph space, a sub-region center module is designed. Most importantly, to explicitly learn the anatomic relations, the prior anatomic-relations among the sub-regions are encoded in the form of an adjacency matrix and embedded into the intermediate node representations to guide framework learning. The ARR-GCN was validated on two FGS-OSA tasks: i) liver segments segmentation, and ii) lung lobes segmentation. Experimental results on both tasks outperformed other state-of-the-art segmentation methods and yielded promising performances by ARR-GCN for suppressing ambiguities among sub-regions.

Details DOI

NeurIPS Conference 2023 Conference Paper

IDRNet: Intervention-Driven Relation Network for Semantic Segmentation

Zhenchao Jin
Xiaowei Hu
Lingting Zhu
Luchuan Song
Li Yuan
Lequan Yu

Co-occurrent visual patterns suggest that pixel relation modeling facilitates dense prediction tasks, which inspires the development of numerous context modeling paradigms, \emph{e. g. }, multi-scale-driven and similarity-driven context schemes. Despite the impressive results, these existing paradigms often suffer from inadequate or ineffective contextual information aggregation due to reliance on large amounts of predetermined priors. To alleviate the issues, we propose a novel \textbf{I}ntervention-\textbf{D}riven \textbf{R}elation \textbf{Net}work (\textbf{IDRNet}), which leverages a deletion diagnostics procedure to guide the modeling of contextual relations among different pixels. Specifically, we first group pixel-level representations into semantic-level representations with the guidance of pseudo labels and further improve the distinguishability of the grouped representations with a feature enhancement module. Next, a deletion diagnostics procedure is conducted to model relations of these semantic-level representations via perceiving the network outputs and the extracted relations are utilized to guide the semantic-level representations to interact with each other. Finally, the interacted representations are utilized to augment original pixel-level representations for final predictions. Extensive experiments are conducted to validate the effectiveness of IDRNet quantitatively and qualitatively. Notably, our intervention-driven context scheme brings consistent performance improvements to state-of-the-art segmentation frameworks and achieves competitive results on popular benchmark datasets, including ADE20K, COCO-Stuff, PASCAL-Context, LIP, and Cityscapes.

PDF Details

AAAI Conference 2023 Conference Paper

MulGT: Multi-Task Graph-Transformer with Task-Aware Knowledge Injection and Domain Knowledge-Driven Pooling for Whole Slide Image Analysis

Weiqin Zhao
Shujun Wang
Maximus Yeung
Tianye Niu
Lequan Yu

Whole slide image (WSI) has been widely used to assist automated diagnosis under the deep learning fields. However, most previous works only discuss the SINGLE task setting which is not aligned with real clinical setting, where pathologists often conduct multiple diagnosis tasks simultaneously. Also, it is commonly recognized that the multi-task learning paradigm can improve learning efficiency by exploiting commonalities and differences across multiple tasks. To this end, we present a novel multi-task framework (i.e., MulGT) for WSI analysis by the specially designed Graph-Transformer equipped with Task-aware Knowledge Injection and Domain Knowledge-driven Graph Pooling modules. Basically, with the Graph Neural Network and Transformer as the building commons, our framework is able to learn task-agnostic low-level local information as well as task-specific high-level global representation. Considering that different tasks in WSI analysis depend on different features and properties, we also design a novel Task-aware Knowledge Injection module to transfer the task-shared graph embedding into task-specific feature spaces to learn more accurate representation for different tasks. Further, we elaborately design a novel Domain Knowledge-driven Graph Pooling module for each task to improve both the accuracy and robustness of different tasks by leveraging different diagnosis patterns of multiple tasks. We evaluated our method on two public WSI datasets from TCGA projects, i.e., esophageal carcinoma and kidney carcinoma. Experimental results show that our method outperforms single-task counterparts and the state-of-theart methods on both tumor typing and staging tasks.

PDF Details DOI

JBHI Journal 2022 Journal Article

All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-Supervised Medical Image Segmentation

Zhe Xu
Yixin Wang
Donghuan Lu
Lequan Yu
Jiangpeng Yan
Jie Luo
Kai Ma
Yefeng Zheng

Semi-supervised learning has substantially advanced medical image segmentation since it alleviates the heavy burden of acquiring the costly expert-examined annotations. Especially, the consistency-based approaches have attracted more attention for their superior performance, wherein the real labels are only utilized to supervise their paired images via supervised loss while the unlabeled images are exploited by enforcing the perturbation-based “unsupervised” consistency without explicit guidance from those real labels. However, intuitively, the expert-examined real labels contain more reliable supervision signals. Observing this, we ask an unexplored but interesting question: can we exploit the unlabeled data via explicit real label supervision for semi-supervised training? To this end, we discard the previous perturbation-based consistency but absorb the essence of non-parametric prototype learning. Based on the prototypical networks, we then propose a novel cyclic prototype consistency learning (CPCL) framework, which is constructed by a labeled-to-unlabeled (L2U) prototypical forward process and an unlabeled-to-labeled (U2L) backward process. Such two processes synergistically enhance the segmentation network by encouraging morediscriminative and compact features. In this way, our framework turns previous “unsupervised” consistency into new “supervised” consistency, obtaining the “all-around real label supervision” property of our method. Extensive experiments on brain tumor segmentation from MRI and kidney segmentation from CT images show that our CPCL can effectively exploit the unlabeled data and outperform other state-of-the-art semi-supervised medical image segmentation methods.

Details DOI

AAAI Conference 2022 Conference Paper

H^2-MIL: Exploring Hierarchical Representation with Heterogeneous Multiple Instance Learning for Whole Slide Image Analysis

Wentai Hou
Lequan Yu
Chengxuan Lin
Helong Huang
Rongshan Yu
Jing Qin
Liansheng Wang

Current representation learning methods for whole slide image (WSI) with pyramidal resolutions are inherently homogeneous and flat, which cannot fully exploit the multiscale and heterogeneous diagnostic information of different structures for comprehensive analysis. This paper presents a novel graph neural network-based multiple instance learning framework (i. e. , H2 -MIL) to learn hierarchical representation from a heterogeneous graph with different resolutions for WSI analysis. A heterogeneous graph with the “resolution” attribute is constructed to explicitly model the feature and spatial-scaling relationship of multi-resolution patches. We then design a novel resolution-aware attention convolution (RAConv) block to learn compact yet discriminative representation from the graph, which tackles the heterogeneity of node neighbors with different resolutions and yields more reliable message passing. More importantly, to explore the task-related structured information of WSI pyramid, we elaborately design a novel iterative hierarchical pooling (IH- Pool) module to progressively aggregate the heterogeneous graph based on scaling relationships of different nodes. We evaluated our method on two public WSI datasets from the TCGA project, i. e. , esophageal cancer and kidney cancer. Experimental results show that our method clearly outperforms the state-of-the-art methods on both tumor typing and staging tasks.

PDF Details

NeurIPS Conference 2022 Conference Paper

Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning

Fuying Wang
Yuyin Zhou
Shujun Wang
Varut Vardhanabhuti
Lequan Yu

Learning medical visual representations directly from paired radiology reports has become an emerging topic in representation learning. However, existing medical image-text joint learning methods are limited by instance or local supervision analysis, ignoring disease-level semantic correspondences. In this paper, we present a novel Multi-Granularity Cross-modal Alignment (MGCA) framework for generalized medical visual representation learning by harnessing the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels, i. e. , pathological region-level, instance-level, and disease-level. Specifically, we first incorporate the instance-wise alignment module by maximizing the agreement between image-report pairs. Further, for token-wise alignment, we introduce a bidirectional cross-attention strategy to explicitly learn the matching between fine-grained visual tokens and text tokens, followed by contrastive learning to align them. More important, to leverage the high-level inter-subject relationship semantic (e. g. , disease) correspondences, we design a novel cross-modal disease-level alignment paradigm to enforce the cross-modal cluster assignment consistency. Extensive experimental results on seven downstream medical image datasets covering image classification, object detection, and semantic segmentation tasks demonstrate the stable and superior performance of our framework.

PDF Details

AIIM Journal 2021 Journal Article

NIA-Network: Towards improving lung CT infection detection for COVID-19 diagnosis

Wei Li
Jinlin Chen
Ping Chen
Lequan Yu
Xiaohui Cui
Yiwei Li
Fang Cheng
Wen Ouyang

During pandemics (e. g. , COVID-19) physicians have to focus on diagnosing and treating patients, which often results in that only a limited amount of labeled CT images is available. Although recent semi-supervised learning algorithms may alleviate the problem of annotation scarcity, limited real-world CT images still cause those algorithms producing inaccurate detection results, especially in real-world COVID-19 cases. Existing models often cannot detect the small infected regions in COVID-19 CT images, such a challenge implicitly causes that many patients with minor symptoms are misdiagnosed and develop more severe symptoms, causing a higher mortality. In this paper, we propose a new method to address this challenge. Not only can we detect severe cases, but also detect minor symptoms using real-world COVID-19 CT images in which the source domain only includes limited labeled CT images but the target domain has a lot of unlabeled CT images. Specifically, we adopt Network-in-Network and Instance Normalization to build a new module (we term it NI module) and extract discriminative representations from CT images from both source and target domains. A domain classifier is utilized to implement infected region adaptation from source domain to target domain in an Adversarial Learning manner, and learns domain-invariant region proposal network (RPN) in the Faster R-CNN model. We call our model NIA-Network (Network-in-Network, Instance Normalization and Adversarial Learning), and conduct extensive experiments on two COVID-19 datasets to validate our approach. The experimental results show that our model can effectively detect infected regions with different sizes and achieve the highest diagnostic accuracy compared with existing SOTA methods.

Details DOI

AAAI Conference 2020 Conference Paper

Towards Cross-Modality Medical Image Segmentation with Online Mutual Knowledge Distillation

Kang Li
Lequan Yu
Shujun Wang
Pheng-Ann Heng

The success of deep convolutional neural networks is partially attributed to the massive amount of annotated training data. However, in practice, medical data annotations are usually expensive and time-consuming to be obtained. Considering multi-modality data with the same anatomic structures are widely available in clinic routine, in this paper, we aim to exploit the prior knowledge (e. g. , shape priors) learned from one modality (aka. , assistant modality) to improve the segmentation performance on another modality (aka. , target modality) to make up annotation scarcity. To alleviate the learning dif- ﬁculties caused by modality-speciﬁc appearance discrepancy, we ﬁrst present an Image Alignment Module (IAM) to narrow the appearance gap between assistant and target modality data. We then propose a novel Mutual Knowledge Distillation (MKD) scheme to thoroughly exploit the modality-shared knowledge to facilitate the target-modality segmentation. To be speciﬁc, we formulate our framework as an integration of two individual segmentors. Each segmentor not only explicitly extracts one modality knowledge from corresponding annotations, but also implicitly explores another modality knowledge from its counterpart in mutual-guided manner. The ensemble of two segmentors would further integrate the knowledge from both modalities and generate reliable segmentation results on target modality. Experimental results on the public multi-class cardiac segmentation data, i. e. , MM- WHS 2017, show that our method achieves large improvements on CT segmentation by utilizing additional MRI data and outperforms other state-of-the-art multi-modality learning methods.

PDF Details

YNIMG Journal 2018 Journal Article

VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images

Hao Chen
Qi Dou
Lequan Yu
Jing Qin
Pheng-Ann Heng

Segmentation of key brain tissues from 3D medical images is of great significance for brain disease diagnosis, progression assessment and monitoring of neurologic conditions. While manual segmentation is time-consuming, laborious, and subjective, automated segmentation is quite challenging due to the complicated anatomical environment of brain and the large variations of brain tissues. We propose a novel voxelwise residual network (VoxResNet) with a set of effective training schemes to cope with this challenging problem. The main merit of residual learning is that it can alleviate the degradation problem when training a deep network so that the performance gains achieved by increasing the network depth can be fully leveraged. With this technique, our VoxResNet is built with 25 layers, and hence can generate more representative features to deal with the large variations of brain tissues than its rivals using hand-crafted features or shallower networks. In order to effectively train such a deep network with limited training data for brain segmentation, we seamlessly integrate multi-modality and multi-level contextual information into our network, so that the complementary information of different modalities can be harnessed and features of different scales can be exploited. Furthermore, an auto-context version of the VoxResNet is proposed by combining the low-level image appearance features, implicit shape information, and high-level context together for further improving the segmentation performance. Extensive experiments on the well-known benchmark (i. e. , MRBrainS) of brain segmentation from 3D magnetic resonance (MR) images corroborated the efficacy of the proposed VoxResNet. Our method achieved the first place in the challenge out of 37 competitors including several state-of-the-art brain segmentation methods. Our method is inherently general and can be readily applied as a powerful tool to many brain-related studies, where accurate segmentation of brain structures is critical.

Details DOI

AAAI Conference 2017 Conference Paper

Fine-Grained Recurrent Neural Networks for Automatic Prostate Segmentation in Ultrasound Images

Xin Yang
Lequan Yu
Lingyun Wu
Yi Wang
Dong Ni
Jing Qin
Pheng-Ann Heng

Boundary incompleteness raises great challenges to automatic prostate segmentation in ultrasound images. Shape prior can provide strong guidance in estimating the missing boundary, but traditional shape models often suffer from hand-crafted descriptors and local information loss in the ﬁtting procedure. In this paper, we attempt to address those issues with a novel framework. The proposed framework can seamlessly integrate feature extraction and shape prior exploring, and estimate the complete boundary with a sequential manner. Our framework is composed of three key modules. Firstly, we serialize the static 2D prostate ultrasound images into dynamic sequences and then predict prostate shapes by sequentially exploring shape priors. Intuitively, we propose to learn the shape prior with the biologically plausible Recurrent Neural Networks (RNNs). This module is corroborated to be effective in dealing with the boundary incompleteness. Secondly, to alleviate the bias caused by different serialization manners, we propose a multi-view fusion strategy to merge shape predictions obtained from different perspectives. Thirdly, we further implant the RNN core into a multiscale Auto-Context scheme to successively reﬁne the details of the shape prediction map. With extensive validation on challenging prostate ultrasound images, our framework bridges severe boundary incompleteness and achieves the best performance in prostate boundary delineation when compared with several advanced methods. Additionally, our approach is general and can be extended to other medical image segmentation tasks, where boundary incompleteness is one of the main challenges.

PDF Details

JBHI Journal 2017 Journal Article

Integrating Online and Offline Three-Dimensional Deep Learning for Automated Polyp Detection in Colonoscopy Videos

Lequan Yu
Hao Chen
Qi Dou
Jing Qin
Pheng Ann Heng

Automated polyp detection in colonoscopy videos has been demonstrated to be a promising way for colorectal cancer prevention and diagnosis. Traditional manual screening is time consuming, operator dependent, and error prone; hence, automated detection approach is highly demanded in clinical practice. However, automated polyp detection is very challenging due to high intraclass variations in polyp size, color, shape, and texture, and low interclass variations between polyps and hard mimics. In this paper, we propose a novel offline and online three-dimensional (3-D) deep learning integration framework by leveraging the 3-D fully convolutional network (3D-FCN) to tackle this challenging problem. Compared with the previous methods employing hand-crafted features or 2-D convolutional neural network, the 3D-FCN is capable of learning more representative spatio-temporal features from colonoscopy videos, and hence has more powerful discrimination capability. More importantly, we propose a novel online learning scheme to deal with the problem of limited training data by harnessing the specific information of an input video in the learning process. We integrate offline and online learning to effectively reduce the number of false positives generated by the offline network and further improve the detection performance. Extensive experiments on the dataset of MICCAI 2015 Challenge on Polyp Detection demonstrated the better performance of our method when compared with other competitors.

Details DOI

AAAI Conference 2017 Conference Paper

Volumetric ConvNets with Mixed Residual Connections for Automated Prostate Segmentation from 3D MR Images

Lequan Yu
Xin Yang
Hao Chen
Jing Qin
Pheng Ann Heng

Automated prostate segmentation from 3D MR images is very challenging due to large variations of prostate shape and indistinct prostate boundaries. We propose a novel volumetric convolutional neural network (ConvNet) with mixed residual connections to cope with this challenging problem. Compared with previous methods, our volumetric ConvNet has two compelling advantages. First, it is implemented in a 3D manner and can fully exploit the 3D spatial contextual information of input data to perform efﬁcient, precise and volumeto-volume prediction. Second and more important, the novel combination of residual connections (i. e. , long and short) can greatly improve the training efﬁciency and discriminative capability of our network by enhancing the information propagation within the ConvNet both locally and globally. While the forward propagation of location information can improve the segmentation accuracy, the smooth backward propagation of gradient ﬂow can accelerate the convergence speed and enhance the discrimination capability. Extensive experiments on the open MICCAI PROMISE12 challenge dataset corroborated the effectiveness of the proposed volumetric ConvNet with mixed residual connections. Our method ranked the ﬁrst in the challenge, outperforming other competitors by a large margin with respect to most of evaluation metrics. The proposed volumetric ConvNet is general enough and can be easily extended to other medical image analysis tasks, especially ones with limited training data.

PDF Details