Arrow Research search

Author name cluster

Yuexiang Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
1 author row

Possible papers

22

NeurIPS Conference 2025 Conference Paper

Degradation-Aware Dynamic Schrödinger Bridge for Unpaired Image Restoration

  • Jingjun Yi
  • Qi Bi
  • Hao Zheng
  • Huimin Huang
  • Yixian Shen
  • Haolan Zhan
  • Wei Ji
  • Yawen Huang

Image restoration is a fundamental task in computer vision and machine learning, which learns a mapping between the clear images and the degraded images under various conditions (e. g. , blur, low-light, haze). Yet, most existing image restoration methods are highly restricted by the requirement of degraded and clear image pairs, which limits the generalization and feasibility to enormous real-world scenarios without paired images. To address this bottleneck, we propose a Degradation-aware Dynamic Schr\"{o}dinger Bridge (DDSB) for unpaired image restoration. Its general idea is to learn a Schr\"{o}dinger Bridge between clear and degraded image distribution, while at the same time emphasizing the physical degradation priors to reduce the accumulation of errors during the restoration process. A Degradation-aware Optimal Transport (DOT) learning scheme is accordingly devised. Training a degradation model to learn the inverse restoration process is particularly challenging, as it must be applicable across different stages of the iterative restoration process. A Dynamic Transport with Consistency (DTC) learning objective is further proposed to reduce the loss of image details in the early iterations and therefore refine the degradation model. Extensive experiments on multiple image degradation tasks show its state-of-the-art performance over the prior arts.

AAAI Conference 2025 Conference Paper

DGFamba: Learning Flow Factorized State Space for Visual Domain Generalization

  • Qi Bi
  • Jingjun Yi
  • Hao Zheng
  • Haolan Zhan
  • Wei Ji
  • Yawen Huang
  • Yuexiang Li

Domain generalization aims to learn a representation from the source domain, which can be generalized to arbitrary unseen target domains. A fundamental challenge for visual domain generalization is the domain gap caused by the dramatic style variation whereas the image content is stable. The realm of selective state space, exemplified by VMamba, demonstrates its global receptive field in representing the content. However, the way exploiting the domain-invariant property for selective state space is rarely explored. In this paper, we propose a novel Flow Factorized State Space model, dubbed as DGFamba, for visual domain generalization. To maintain domain consistency, we innovatively map the style-augmented and the original state embeddings by flow factorization. In this latent flow space, each state embedding from a certain style is specified by a latent probability path. By aligning these probability paths in the latent space, the state embeddings are able to represent the same content distribution regardless of the style differences. Extensive experiments conducted on various visual domain generalization settings show its state-of-the-art performance.

JBHI Journal 2025 Journal Article

Federated Pseudo Modality Generation for Incomplete Multi-Modal MRI Reconstruction

  • Yunlu Yan
  • Chun-Mei Feng
  • Yuexiang Li
  • Ping Li
  • Rick Siow Mong Goh
  • Baiying Lei
  • Weiming Wang
  • David Dagan Feng

While multi-modal learning has been widely used for MRI reconstruction, it relies on paired multi-modal data, which is difficult to acquire in real clinical scenarios. Especially in the federated setting, there is a common issue that several medical institutions suffer from missing modalities or even only have single-modal data. Therefore, it is infeasible to deploy a standard federated learning framework in such conditions. In this paper, we propose a novel communication-efficient federated learning framework (namely Fed-PMG) to address the missing modality challenge in federated multi-modal MRI reconstruction. Specifically, we utilize a pseudo modality generation mechanism to recover the missing modality for each single-modal client by sharing the distribution information of the amplitude spectrum in frequency space. However, the step of sharing the original amplitude spectrum leads to heavy communication costs. To reduce the communication cost, we introduce a clustering scheme to project the set of amplitude spectrum into a finite number of cluster centroids and share them among the clients. With such an elaborate design, our approach can effectively complete the missing modality within an acceptable communication cost. Extensive experimental results demonstrate that our proposed method can outperform state-of-the-art methods and reach a performance similar to the ideal scenario (i. e. , all clients have the full set of modalities).

NeurIPS Conference 2025 Conference Paper

Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization

  • Hao Zheng
  • Jingjun Yi
  • Qi Bi
  • Huimin Huang
  • Haolan Zhan
  • Yawen Huang
  • Yuexiang Li
  • Xian Wu

Domain generalization aims to train models that perform robustly on unseen target domains without access to target data. The realm of vision-language foundation model has opened a new venue owing to its inherent out-of-distribution generalization capability. However, the static alignment to class-level textual anchors remains insufficient to handle the dramatic distribution discrepancy from diverse domain-specific visual features. In this work, we propose a novel cross-domain Schrödinger Bridge (SB) method, namely SBGen, to handle this challenge, which explicitly formulates the stochastic semantic evolution, to gain better generalization to unseen domains. Technically, the proposed \texttt{SBGen} consists of three key components: (1) \emph{text-guided domain-aware feature selection} to isolate semantically aligned image tokens; (2) \emph{stochastic cross-domain evolution} to simulate the SB dynamics via a learnable time-conditioned drift; and (3) \emph{stochastic domain-agnostic interpolation} to construct semantically grounded feature trajectories. Empirically, \texttt{SBGen} achieves state-of-the-art performance on domain generalization in both classification and segmentation. This work highlights the importance of modeling domain shifts as structured stochastic processes grounded in semantic alignment.

AAAI Conference 2025 Conference Paper

S³-Mamba: Small-Size-Sensitive Mamba for Lesion Segmentation

  • Gui Wang
  • Yuexiang Li
  • Wenting Chen
  • Meidan Ding
  • Wooi Ping Cheah
  • Rong Qu
  • Jianfeng Ren
  • Linlin Shen

Small lesions play a critical role in early disease diagnosis and intervention of severe infections. Popular models often face challenges in segmenting small lesions, as it occupies only a minor portion of an image, while down-sampling operations may inevitably lose focus on local features of small lesions. To tackle the challenges, we propose a Small-Size-Sensitive Mamba (S³-Mamba), which promotes the sensitivity to small lesions across three dimensions: channel, spatial, and training strategy. Specifically, an Enhanced Visual State Space block is designed to focus on small lesions through multiple residual connections to preserve local features, and selectively amplify important details while suppressing irrelevant ones through channel-wise attention. A Tensor-based Cross-feature Multi-scale Attention is designed to integrate input image features and intermediate-layer features with edge features and exploit the attentive support of features across multiple scales, thereby retaining spatial details of small lesions at various granularities. Finally, we introduce a novel regularized curriculum learning to automatically assess lesion size and sample difficulty, and gradually focus from easy samples to hard ones like small lesions. Extensive experiments on three medical image segmentation datasets show the superiority of our S³-Mamba, especially in segmenting small lesions.

AAAI Conference 2024 Conference Paper

Combinatorial CNN-Transformer Learning with Manifold Constraints for Semi-supervised Medical Image Segmentation

  • Huimin Huang
  • Yawen Huang
  • Shiao Xie
  • Lanfen Lin
  • Ruofeng Tong
  • Yen-Wei Chen
  • Yuexiang Li
  • Yefeng Zheng

Semi-supervised learning (SSL), as one of the dominant methods, aims at leveraging the unlabeled data to deal with the annotation dilemma of supervised learning, which has attracted much attentions in the medical image segmentation. Most of the existing approaches leverage a unitary network by convolutional neural networks (CNNs) with compulsory consistency of the predictions through small perturbations applied to inputs or models. The penalties of such a learning paradigm are that (1) CNN-based models place severe limitations on global learning; (2) rich and diverse class-level distributions are inhibited. In this paper, we present a novel CNN-Transformer learning framework in the manifold space for semi-supervised medical image segmentation. First, at intra-student level, we propose a novel class-wise consistency loss to facilitate the learning of both discriminative and compact target feature representations. Then, at inter-student level, we align the CNN and Transformer features using a prototype-based optimal transport method. Extensive experiments show that our method outperforms previous state-of-the-art methods on three public medical image segmentation benchmarks.

JBHI Journal 2024 Journal Article

Cross-Modal Vertical Federated Learning for MRI Reconstruction

  • Yunlu Yan
  • Hong Wang
  • Yawen Huang
  • Nanjun He
  • Lei Zhu
  • Yong Xu
  • Yuexiang Li
  • Yefeng Zheng

Federated learning enables multiple hospitals to cooperatively learn a shared model without privacy disclosure. Existing methods often take a common assumption that the data from different hospitals have the same modalities. However, such a setting is difficult to fully satisfy in practical applications, since the imaging guidelines may be different between hospitals, which makes the number of individuals with the same set of modalities limited. To this end, we formulate this practical-yet-challenging cross-modal vertical federated learning task, in which data from multiple hospitals have different modalities with a small amount of multi-modality data collected from the same individuals. To tackle such a situation, we develop a novel framework, namely Federated Consistent Regularization constrained Feature Disentanglement (Fed-CRFD), for boosting MRI reconstruction by effectively exploring the overlapping samples (i. e. , same patients with different modalities at different hospitals) and solving the domain shift problem caused by different modalities. Particularly, our Fed-CRFD involves an intra-client feature disentangle scheme to decouple data into modality-invariant and modality-specific features, where the modality-invariant features are leveraged to mitigate the domain shift problem. In addition, a cross-client latent representation consistency constraint is proposed specifically for the overlapping samples to further align the modality-invariant features extracted from different modalities. Hence, our method can fully exploit the multi-source data from hospitals while alleviating the domain shift problem. Extensive experiments on two typical MRI datasets demonstrate that our network clearly outperforms state-of-the-art MRI reconstruction methods.

NeurIPS Conference 2024 Conference Paper

Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation

  • Qi Bi
  • Jingjun Yi
  • Hao Zheng
  • Haolan Zhan
  • Yawen Huang
  • Wei Ji
  • Yuexiang Li
  • Yefeng Zheng

The emerging vision foundation model (VFM) has inherited the ability to generalize to unseen images. Nevertheless, the key challenge of domain-generalized semantic segmentation (DGSS) lies in the domain gap attributed to the cross-domain styles, i. e. , the variance of urban landscape and environment dependencies. Hence, maintaining the style-invariant property with varying domain styles becomes the key bottleneck in harnessing VFM for DGSS. The frequency space after Haar wavelet transformation provides a feasible way to decouple the style information from the domain-invariant content, since the content and style information are retained in the low- and high- frequency components of the space, respectively. To this end, we propose a novel Frequency-Adapted (FADA) learning scheme to advance the frontier. Its overall idea is to separately tackle the content and style information by frequency tokens throughout the learning process. Particularly, the proposed FADA consists of two branches, i. e. , low- and high- frequency branches. The former one is able to stabilize the scene content, while the latter one learns the scene styles and eliminates its impact to DGSS. Experiments conducted on various DGSS settings show the state-of-the-art performance of our FADA and its versatility to a variety of VFMs. Source code is available at \url{https: //github. com/BiQiWHU/FADA}.

AAAI Conference 2024 Conference Paper

Learning Generalized Medical Image Segmentation from Decoupled Feature Queries

  • Qi Bi
  • Jingjun Yi
  • Hao Zheng
  • Wei Ji
  • Yawen Huang
  • Yuexiang Li
  • Yefeng Zheng

Domain generalized medical image segmentation requires models to learn from multiple source domains and generalize well to arbitrary unseen target domain. Such a task is both technically challenging and clinically practical, due to the domain shift problem (i.e., images are collected from different hospitals and scanners). Existing methods focused on either learning shape-invariant representation or reaching consensus among the source domains. An ideal generalized representation is supposed to show similar pattern responses within the same channel for cross-domain images. However, to deal with the significant distribution discrepancy, the network tends to capture similar patterns by multiple channels, while different cross-domain patterns are also allowed to rest in the same channel. To address this issue, we propose to leverage channel-wise decoupled deep features as queries. With the aid of cross-attention mechanism, the long-range dependency between deep and shallow features can be fully mined via self-attention and then guides the learning of generalized representation. Besides, a relaxed deep whitening transformation is proposed to learn channel-wise decoupled features in a feasible way. The proposed decoupled fea- ture query (DFQ) scheme can be seamlessly integrate into the Transformer segmentation model in an end-to-end manner. Extensive experiments show its state-of-the-art performance, notably outperforming the runner-up by 1.31% and 1.98% with DSC metric on generalized fundus and prostate benchmarks, respectively. Source code is available at https://github.com/BiQiWHU/DFQ.

NeurIPS Conference 2024 Conference Paper

Samba: Severity-aware Recurrent Modeling for Cross-domain Medical Image Grading

  • Qi Bi
  • Jingjun Yi
  • Hao Zheng
  • Wei Ji
  • Haolan Zhan
  • Yawen Huang
  • Yuexiang Li
  • Yefeng Zheng

Disease grading is a crucial task in medical image analysis. Due to the continuous progression of diseases, i. e. , the variability within the same level and the similarity between adjacent stages, accurate grading is highly challenging. Furthermore, in real-world scenarios, models trained on limited source domain datasets should also be capable of handling data from unseen target domains. Due to the cross-domain variants, the feature distribution between source and unseen target domains can be dramatically different, leading to a substantial decrease in model performance. To address these challenges in cross-domain disease grading, we propose a Severity-aware Recurrent Modeling (Samba) method in this paper. As the core objective of most staging tasks is to identify the most severe lesions, which may only occupy a small portion of the image, we propose to encode image patches in a sequential and recurrent manner. Specifically, a state space model is tailored to store and transport the severity information by hidden states. Moreover, to mitigate the impact of cross-domain variants, an Expectation-Maximization (EM) based state recalibration mechanism is designed to map the patch embeddings into a more compact space. We model the feature distributions of different lesions through the Gaussian Mixture Model (GMM) and reconstruct the intermediate features based on learnable severity bases. Extensive experiments show the proposed Samba outperforms the VMamba baseline by an average accuracy of 23. 5\%, 5. 6\% and 4. 1\% on the cross-domain grading of fatigue fracture, breast cancer and diabetic retinopathy, respectively. Source code is available at \url{https: //github. com/BiQiWHU/Samba}.

IS Journal 2024 Journal Article

Unraveling Complexity: An Exploration Into Large-Scale Multimodal Signal Processing

  • Zhenyu Wen
  • Yuheng Ye
  • Jie Su
  • Taotao Li
  • Jinhao Wan
  • Shilian Zheng
  • Zhen Hong
  • Shibo He

Advanced communication systems and military reconnaissance are increasingly prevalent in high-tech environments, greatly supported by the flourishing in signal processing technologies. The recent exponential proliferation of sensors led to an unprecedented expansion in the scale and diversity of signals across various modalities. Such an influx poses significant challenges in effectively integrating multimodal signal data to deliver comprehensive and interpretive solutions across a diverse range of applications. In this article, we provide an overview of the core issues, challenges, and future research directions in different stages of developing large-scale multimodal signal processing models. Additionally, we introduce a prior investigation into signal representation learning, where we propose a contrastive-learning-based framework to extract fine-grained signal features under few-shot conditions. Our proposed framework achieves a 24. 1% performance improvement over baseline approaches, consistently demonstrating superiority over state-of-the-art methods. The code is accessible in this repository: https://github.com/YYH211/LSM.

JBHI Journal 2023 Journal Article

Blind Super-Resolution of 3D MRI via Unsupervised Domain Transformation

  • Hexiang Zhou
  • Yawen Huang
  • Yuexiang Li
  • Yi Zhou
  • Yefeng Zheng

High-resolution medical images can be effectively used for clinical diagnosis. However, the acquisition of high-resolution images is difficult and often limited by medical instruments. Super-resolution (SR) methods provide a solution, where high-resolution (HR) images can be reconstructed from low-resolution (LR) ones. Most of existing deep neural networks for 3D SR medical images trained in a non-blind process, where LR images are directly degraded from HR data via a pre-determined downscale method. Such approaches rely heavily on the assumed degradation model, resulting in inevitable deviations in real clinical practice. Blind super-resolution, as a more attractive research line for this field, aims to generate HR images from LR inputs containing unknown degradation. Towards generalizing SR models for diverse types of degradation, we propose a robust blind SR of 3D medical images in an unsupervised manner with domain correction and upscaling treatment. First, a CycleGAN-based architecture is implemented to generate the LR data from the source domain to the target one for domain correction. Then, an upscaling network is learned via pre-determined HR-LR couples for reconstruction. The proposed framework is able to automatically learn noisy and blurry correction kernels for unpaired 3D SR magnetic resonance images (MRI). Our method achieves better and more robust performances in reconstruction of HR images from LR MRI with multiple unknown degradation processes, and show its superiority to other state-of-the-art supervised models and cycle-consistency based methods, especially in severe distortion cases.

AAAI Conference 2023 Conference Paper

ClassFormer: Exploring Class-Aware Dependency with Transformer for Medical Image Segmentation

  • Huimin Huang
  • Shiao Xie
  • Lanfen Lin
  • Ruofeng Tong
  • Yen-Wei Chen
  • Hong Wang
  • Yuexiang Li
  • Yawen Huang

Vision Transformers have recently shown impressive performances on medical image segmentation. Despite their strong capability of modeling long-range dependencies, the current methods still give rise to two main concerns in a class-level perspective: (1) intra-class problem: the existing methods lacked in extracting class-specific correspondences of different pixels, which may lead to poor object coverage and/or boundary prediction; (2) inter-class problem: the existing methods failed to model explicit category-dependencies among various objects, which may result in inaccurate localization. In light of these two issues, we propose a novel transformer, called ClassFormer, powered by two appealing transformers, i.e., intra-class dynamic transformer and inter-class interactive transformer, to address the challenge of fully exploration on compactness and discrepancy. Technically, the intra-class dynamic transformer is first designed to decouple representations of different categories with an adaptive selection mechanism for compact learning, which optimally highlights the informative features to reflect the salient keys/values from multiple scales. We further introduce the inter-class interactive transformer to capture the category dependency among different objects, and model class tokens as the representative class centers to guide a global semantic reasoning. As a consequence, the feature consistency is ensured with the expense of intra-class penalization, while inter-class constraint strengthens the feature discriminability between different categories. Extensive empirical evidence shows that ClassFormer can be easily plugged into any architecture, and yields improvements over the state-of-the-art methods in three public benchmarks.

AAAI Conference 2023 Conference Paper

Combating Mode Collapse via Offline Manifold Entropy Estimation

  • Haozhe Liu
  • Bing Li
  • Haoqian Wu
  • Hanbang Liang
  • Yawen Huang
  • Yuexiang Li
  • Bernard Ghanem
  • Yefeng Zheng

Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications in recent years. However, mode collapse remains a critical problem in GANs. In this paper, we propose a novel training pipeline to address the mode collapse issue of GANs. Different from existing methods, we propose to generalize the discriminator as feature embedding and maximize the entropy of distributions in the embedding space learned by the discriminator. Specifically, two regularization terms, i.e., Deep Local Linear Embedding (DLLE) and Deep Isometric feature Mapping (DIsoMap), are introduced to encourage the discriminator to learn the structural information embedded in the data, such that the embedding space learned by the discriminator can be well-formed. Based on the well-learned embedding space supported by the discriminator, a non-parametric entropy estimator is designed to efficiently maximize the entropy of embedding vectors, playing as an approximation of maximizing the entropy of the generated distribution. By improving the discriminator and maximizing the distance of the most similar samples in the embedding space, our pipeline effectively reduces the mode collapse without sacrificing the quality of generated samples. Extensive experimental results show the effectiveness of our method which outperforms the GAN baseline, MaF-GAN on CelebA (9.13 vs. 12.43 in FID) and surpasses the recent state-of-the-art energy-based model on the ANIMEFACE dataset (2.80 vs. 2.26 in Inception score).

NeurIPS Conference 2023 Conference Paper

Dynamically Masked Discriminator for GANs

  • Wentian Zhang
  • Haozhe Liu
  • Bing Li
  • Jinheng Xie
  • Yawen Huang
  • Yuexiang Li
  • Yefeng Zheng
  • Bernard Ghanem

Training Generative Adversarial Networks (GANs) remains a challenging problem. The discriminator trains the generator by learning the distribution of real/generated data. However, the distribution of generated data changes throughout the training process, which is difficult for the discriminator to learn. In this paper, we propose a novel method for GANs from the viewpoint of online continual learning. We observe that the discriminator model, trained on historically generated data, often slows down its adaptation to the changes in the new arrival generated data, which accordingly decreases the quality of generated results. By treating the generated data in training as a stream, we propose to detect whether the discriminator slows down the learning of new knowledge in generated data. Therefore, we can explicitly enforce the discriminator to learn new knowledge fast. Particularly, we propose a new discriminator, which automatically detects its retardation and then dynamically masks its features, such that the discriminator can adaptively learn the temporally-vary distribution of generated data. Experimental results show our method outperforms the state-of-the-art approaches.

NeurIPS Conference 2023 Conference Paper

Learning Visual Prior via Generative Pre-Training

  • Jinheng Xie
  • Kai Ye
  • Yudong Li
  • Yuexiang Li
  • Kevin Qinghong Lin
  • Yefeng Zheng
  • Linlin Shen
  • Mike Zheng Shou

Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e. g. , object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VisorGPT. By discretizing visual locations, e. g. , bounding boxes, human pose, and instance masks, into sequences, VisorGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VisorGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world. Code is available at https: //sierkinhane. github. io/visor-gpt.

IJCAI Conference 2022 Conference Paper

Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction

  • Hong Wang
  • Yuexiang Li
  • Deyu Meng
  • Yefeng Zheng

Inspired by the great success of deep neural networks, learning-based methods have gained promising performances for metal artifact reduction (MAR) in computed tomography (CT) images. However, most of the existing approaches put less emphasis on modelling and embedding the intrinsic prior knowledge underlying this specific MAR task into their network designs. Against this issue, we propose an adaptive convolutional dictionary network (ACDNet), which leverages both model-based and learning-based methods. Specifically, we explore the prior structures of metal artifacts, e. g. , non-local repetitive streaking patterns, and encode them as an explicit weighted convolutional dictionary model. Then, a simple-yet-effective algorithm is carefully designed to solve the model. By unfolding every iterative substep of the proposed algorithm into a network module, we explicitly embed the prior structure into a deep network, i. e. , a clear interpretability for the MAR task. Furthermore, our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image based on its content. Hence, our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods. Comprehensive experiments executed on synthetic and clinical datasets show the superiority of our ACDNet in terms of effectiveness and model generalization. Code and supplementary material are available at https: //github. com/hongwang01/ACDNet.

JBHI Journal 2022 Journal Article

Mix-and-Interpolate: A Training Strategy to Deal With Source-Biased Medical Data

  • Yuexiang Li
  • Jiawei Chen
  • Dong Wei
  • Yanchun Zhu
  • Jianrong Wu
  • Junfeng Xiong
  • Yadong Gang
  • Wenbo Sun

Till March 31st, 2021, the coronavirus disease 2019 (COVID-19) had reportedly infected more than 127 million people and caused over 2. 5 million deaths worldwide. Timely diagnosis of COVID-19 is crucial for management of individual patients as well as containment of the highly contagious disease. Having realized the clinical value of non-contrast chest computed tomography (CT) for diagnosis of COVID-19, deep learning (DL) based automated methods have been proposed to aid the radiologists in reading the huge quantities of CT exams as a result of the pandemic. In this work, we address an overlooked problem for training deep convolutional neural networks for COVID-19 classification using real-world multi-source data, namely, the data source bias problem. The data source bias problem refers to the situation in which certain sources of data comprise only a single class of data, and training with such source-biased data may make the DL models learn to distinguish data sources instead of COVID-19. To overcome this problem, we propose MIx-aNd-Interpolate (MINI), a conceptually simple, easy-to-implement, efficient yet effective training strategy. The proposed MINI approach generates volumes of the absent class by combining the samples collected from different hospitals, which enlarges the sample space of the original source-biased dataset. Experimental results on a large collection of real patient data (1, 221 COVID-19 and 1, 520 negative CT images, and the latter consisting of 786 community acquired pneumonia and 734 non-pneumonia) from eight hospitals and health institutions show that: 1) MINI can improve COVID-19 classification performance upon the baseline (which does not deal with the source bias), and 2) MINI is superior to competing methods in terms of the extent of improvement.

JBHI Journal 2020 Journal Article

Efficient and Effective Training of COVID-19 Classification Networks With Self-Supervised Dual-Track Learning to Rank

  • Yuexiang Li
  • Dong Wei
  • Jiawei Chen
  • Shilei Cao
  • Hongyu Zhou
  • Yanchun Zhu
  • Jianrong Wu
  • Lan Lan

Coronavirus Disease 2019 (COVID-19) has rapidly spread worldwide since first reported. Timely diagnosis of COVID-19 is crucial both for disease control and patient care. Non-contrast thoracic computed tomography (CT) has been identified as an effective tool for the diagnosis, yet the disease outbreak has placed tremendous pressure on radiologists for reading the exams and may potentially lead to fatigue-related mis-diagnosis. Reliable automatic classification algorithms can be really helpful; however, they usually require a considerable number of COVID-19 cases for training, which is difficult to acquire in a timely manner. Meanwhile, how to effectively utilize the existing archive of non-COVID-19 data (the negative samples) in the presence of severe class imbalance is another challenge. In addition, the sudden disease outbreak necessitates fast algorithm development. In this work, we propose a novel approach for effective and efficient training of COVID-19 classification networks using a small number of COVID-19 CT exams and an archive of negative samples. Concretely, a novel self-supervised learning method is proposed to extract features from the COVID-19 and negative samples. Then, two kinds of soft-labels (‘difficulty’ and ‘diversity’) are generated for the negative samples by computing the earth mover's distances between the features of the negative and COVID-19 samples, from which data ‘values’ of the negative samples can be assessed. A pre-set number of negative samples are selected accordingly and fed to the neural network for training. Experimental results show that our approach can achieve superior performance using about half of the negative samples, substantially reducing model training time.

AAAI Conference 2020 Conference Paper

Generative Adversarial Networks for Video-to-Video Domain Adaptation

  • Jiawei Chen
  • Yuexiang Li
  • Kai Ma
  • Yefeng Zheng

Endoscopic videos from multicentres often have different imaging conditions, e. g. , color and illumination, which make the models trained on one domain usually fail to generalize well to another. Domain adaptation is one of the potential solutions to address the problem. However, few of existing works focused on the translation of video-based data. In this work, we propose a novel generative adversarial network (GAN), namely VideoGAN, to transfer the video-based data across different domains. As the frames of a video may have similar content and imaging conditions, the proposed VideoGAN has an X-shape generator to preserve the intravideo consistency during translation. Furthermore, a loss function, namely color histogram loss, is proposed to tune the color distribution of each translated frame. Two colonoscopic datasets from different centres, i. e. , CVC-Clinic and ETIS- Larib, are adopted to evaluate the performance of domain adaptation of our VideoGAN. Experimental results demonstrate that the adapted colonoscopic video generated by our VideoGAN can significantly boost the segmentation accuracy, i. e. , an improvement of 5%, of colorectal polyps on multicentre datasets. As our VideoGAN is a general network architecture, we also evaluate its performance with the CamVid driving video dataset on the cloudy-to-sunny translation task. Comprehensive experiments show that the domain gap could be substantially narrowed down by our VideoGAN.