Arrow Research search

Author name cluster

Yizhou Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

AAAI Conference 2025 Conference Paper

Autoregressive Sequence Modeling for 3D Medical Image Representation

  • Siwen Wang
  • Churan Wang
  • Fei Gao
  • Lixian Su
  • Fandong Zhang
  • Yizhou Wang
  • Yizhou Yu

Three-dimensional (3D) medical images, such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), are essential for clinical applications. However, the need for diverse and comprehensive representations is particularly pronounced when considering the variability across different organs, diagnostic tasks, and imaging modalities. How to effectively interpret the intricate contextual information and extract meaningful insights from these images remains an open challenge to the community. While current self-supervised learning methods have shown potential, they often consider an image as a whole thereby overlooking the extensive, complex relationships among local regions from one or multiple images. In this work, we introduce a pioneering method for learning 3D medical image representations through an autoregressive pre-training framework. Our approach sequences various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence. By employing an autoregressive sequence modeling task, we predict the next visual token in the sequence, which allows our model to deeply understand and integrate the contextual information inherent in 3D medical images. Additionally, we implement a random startup strategy to avoid overestimating token relationships and to enhance the robustness of learning. The effectiveness of our approach is demonstrated by the superior performance over others on nine downstream tasks in public datasets.

AAAI Conference 2025 Conference Paper

SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks

  • Meng Lou
  • Yunxiang Fu
  • Yizhou Yu

Due to the capability of dynamic state space models (SSMs) in capturing long-range dependencies with linear-time computational complexity, Mamba has shown notable performance in NLP tasks. This has inspired the rapid development of Mamba-based vision models, resulting in promising results in visual recognition tasks. However, such models are not capable of distilling features across layers through feature aggregation, interaction, and selection. Moreover, existing cross-layer feature aggregation methods designed for CNNs or ViTs are not practical in Mamba-based models due to high computational costs. Therefore, this paper aims to introduce an efficient cross-layer feature aggregation mechanism for vision backbone networks. Inspired by the Retinal Ganglion Cells (RGCs) in the human visual system, we propose a new sparse cross-layer connection mechanism termed SparX to effectively improve cross-layer feature interaction and reuse. Specifically, we build two different types of network layers: ganglion layers and normal layers. The former has higher connectivity and complexity, enabling multi-layer feature aggregation and interaction in an input-dependent manner. In contrast, the latter has lower connectivity and complexity. By interleaving these two types of layers, we design a new family of vision backbone networks with sparsely cross-connected layers, achieving an excellent trade-off among model size, computational cost, memory cost, and accuracy in comparison to its counterparts. For instance, with fewer parameters, SparX-Mamba-T improves the top-1 accuracy of VMamba-T from 82.5% to 83.5%, while SparX-Swin-T achieves a 1.3% increase in top-1 accuracy compared to Swin-T. Extensive experimental results demonstrate that our new connection mechanism possesses both superior performance and generalization capabilities on various vision tasks.

NeurIPS Conference 2025 Conference Paper

Vision Function Layer in Multimodal LLMs

  • Cheng Shi
  • Yizhou Yu
  • Sibei Yang

This study identifies that visual-related functional decoding is distributed across different decoder layers in Multimodal Large Language Models (MLLMs). Typically, each function, such as counting, grounding, or OCR recognition, narrows down to two or three layers, which we define as Vision Function Layers (VFL). Additionally, the depth and its order of different VFLs exhibits a consistent pattern across different MLLMs, which is well-aligned with human behaviors (e. g. , recognition occurs first, followed by counting, and then grounding). These findings are derived from Visual Token Swapping, our novel analytical framework that modifies targeted KV cache entries to precisely elucidate layer-specific functions during decoding. Furthermore, these insights offer substantial utility in tailoring MLLMs for real-world downstream applications. For instance, when LoRA training is selectively applied to VFLs whose functions align with the training data, VFL-LoRA not only outperform full-LoRA but also prevent out-of-domain function forgetting. Moreover, by analyzing the performance differential on training data when particular VFLs are ablated, VFL-select automatically classifies data by function, enabling highly efficient data selection to directly bolster corresponding capabilities. Consequently, VFL-select surpasses human experts in data selection, and achieves 98% of full-data performance with only 20% of the original dataset. This study delivers deeper comprehension of MLLM visual processing, fostering the creation of more efficient, interpretable, and robust models.

AAAI Conference 2024 Conference Paper

FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels

  • Jichang Li
  • Guanbin Li
  • Hui Cheng
  • Zicheng Liao
  • Yizhou Yu

Federated Learning with Noisy Labels (F-LNL) aims at seeking an optimal server model via collaborative distributed learning by aggregating multiple client models trained with local noisy or clean samples. On the basis of a federated learning framework, recent advances primarily adopt label noise filtering to separate clean samples from noisy ones on each client, thereby mitigating the negative impact of label noise. However, these prior methods do not learn noise filters by exploiting knowledge across all clients, leading to sub-optimal and inferior noise filtering performance and thus damaging training stability. In this paper, we present FedDiv to tackle the challenges of F-LNL. Specifically, we propose a global noise filter called Federated Noise Filter for effectively identifying samples with noisy labels on every client, thereby raising stability during local training sessions. Without sacrificing data privacy, this is achieved by modeling the global distribution of label noise across all clients. Then, in an effort to make the global model achieve higher performance, we introduce a Predictive Consistency based Sampler to identify more credible local data for local model training, thus preventing noise memorization and further boosting the training stability. Extensive experiments on CIFAR-10, CIFAR-100, and Clothing1M demonstrate that FedDiv achieves superior performance over state-of-the-art F-LNL methods under different label noise settings for both IID and non-IID data partitions. Source code is publicly available at https://github.com/lijichang/FLNL-FedDiv.

JBHI Journal 2023 Journal Article

A Knowledge-Guided Framework for Fine-Grained Classification of Liver Lesions Based on Multi-Phase CT Images

  • Xingxin Xu
  • Qikui Zhu
  • Hanning Ying
  • Jiongcheng Li
  • Xiujun Cai
  • Shuo Li
  • Xiaoqing Liu
  • Yizhou Yu

Automatic and accurate differentiation of liver lesions from multi-phase computed tomography imaging is critical for the early detection of liver cancer. Multi-phase data can provide more diagnostic information than single-phase data, and the effective use of multi-phase data can significantly improve diagnostic accuracy. Current fusion methods usually fuse multi-phase information at the image level or feature level, ignoring the specificity of each modality, therefore, the information integration capacity is always limited. In this paper, we propose a Knowledge-guided framework, named MCCNet, which adaptively integrates multi-phase liver lesion information from three different stages to fully utilize and fuse multi-phase liver information. Specifically, 1) a multi-phase self-attention module was designed to adaptively combine and integrate complementary information from three phases using multi-level phase features; 2) a cross-feature interaction module was proposed to further integrate multi-phase fine-grained features from a global perspective; 3) a cross-lesion correlation module was proposed for the first time to imitate the clinical diagnosis process by exploiting inter-lesion correlation in the same patient. By integrating the above three modules into a 3D backbone, we constructed a lesion classification network. The proposed lesion classification network was validated on an in-house dataset containing 3, 683 lesions from 2, 333 patients in 9 hospitals. Extensive experimental results and evaluations on real-world clinical applications demonstrate the effectiveness of the proposed modules in exploiting and fusing multi-phase information.

ICLR Conference 2023 Conference Paper

Advancing Radiograph Representation Learning with Masked Record Modeling

  • Hongyu Zhou
  • Chenyu Lian
  • Liansheng Wang
  • Yizhou Yu

Modern studies in radiograph representation learning (R$^2$L) rely on either self-supervision to encode invariant semantics or associated radiology reports to incorporate medical expertise, while the complementarity between them is barely noticed. To explore this, we formulate the self- and report-completion as two complementary objectives and present a unified framework based on masked record modeling (MRM). In practice, MRM reconstructs masked image patches and masked report tokens following a multi-task scheme to learn knowledge-enhanced semantic representations. With MRM pre-training, we obtain pre-trained models that can be well transferred to various radiography tasks. Specifically, we find that MRM offers superior performance in label-efficient fine-tuning. For instance, MRM achieves 88.5% mean AUC on CheXpert using 1% labeled data, outperforming previous R$^2$L methods with 100% labels. On NIH ChestX-ray, MRM outperforms the best performing counterpart by about 3% under small labeling ratios. Besides, MRM surpasses self- and report-supervised pre-training in identifying the pneumonia type and the pneumothorax area, sometimes by large margins.

YNIMG Journal 2023 Journal Article

CarveMix: A simple data augmentation method for brain lesion segmentation

  • Xinru Zhang
  • Chenghao Liu
  • Ni Ou
  • Xiangzhu Zeng
  • Zhizheng Zhuo
  • Yunyun Duan
  • Xiaoliang Xiong
  • Yizhou Yu

Brain lesion segmentation provides a valuable tool for clinical diagnosis and research, and convolutional neural networks (CNNs) have achieved unprecedented success in the segmentation task. Data augmentation is a widely used strategy to improve the training of CNNs. In particular, data augmentation approaches that mix pairs of annotated training images have been developed. These methods are easy to implement and have achieved promising results in various image processing tasks. However, existing data augmentation approaches based on image mixing are not designed for brain lesions and may not perform well for brain lesion segmentation. Thus, the design of this type of simple data augmentation method for brain lesion segmentation is still an open problem. In this work, we propose a simple yet effective data augmentation approach, dubbed as CarveMix, for CNN-based brain lesion segmentation. Like other mixing-based methods, CarveMix stochastically combines two existing annotated images (annotated for brain lesions only) to obtain new labeled samples. To make our method more suitable for brain lesion segmentation, CarveMix is lesion-aware, where the image combination is performed with a focus on the lesions and preserves the lesion information. Specifically, from one annotated image we carve a region of interest (ROI) according to the lesion location and geometry with a variable ROI size. The carved ROI then replaces the corresponding voxels in a second annotated image to synthesize new labeled images for network training, and additional harmonization steps are applied for heterogeneous data where the two annotated images can originate from different sources. Besides, we further propose to model the mass effect that is unique to whole brain tumor segmentation during image mixing. To evaluate the proposed method, experiments were performed on multiple publicly available or private datasets, and the results show that our method improves the accuracy of brain lesion segmentation. The code of the proposed method is available at https://github.com/ZhangxinruBIT/CarveMix.git.

NeurIPS Conference 2023 Conference Paper

CODA: Generalizing to Open and Unseen Domains with Compaction and Disambiguation

  • Chaoqi Chen
  • Luyao Tang
  • Yue Huang
  • Xiaoguang Han
  • Yizhou Yu

The generalization capability of machine learning systems degenerates notably when the test distribution drifts from the training distribution. Recently, Domain Generalization (DG) has been gaining momentum in enabling machine learning models to generalize to unseen domains. However, most DG methods assume that training and test data share an identical label space, ignoring the potential unseen categories in many real-world applications. In this paper, we delve into a more general but difficult problem termed Open Test-Time DG (OTDG), where both domain shift and open class may occur on the unseen test data. We propose Compaction and Disambiguation (CODA), a novel two-stage framework for learning compact representations and adapting to open classes in the wild. To meaningfully regularize the model's decision boundary, CODA introduces virtual unknown classes and optimizes a new training objective to insert unknowns into the latent space by compacting the embedding space of source known classes. To adapt target samples to the source model, we then disambiguate the decision boundaries between known and unknown classes with a test-time training objective, mitigating the adaptivity gap and catastrophic forgetting challenges. Experiments reveal that CODA can significantly outperform the previous best method on standard DG datasets and harmonize the classification accuracy between known and unknown classes.

AAAI Conference 2023 Conference Paper

Geometry-Aware Network for Domain Adaptive Semantic Segmentation

  • Yinghong Liao
  • Wending Zhou
  • Xu Yan
  • Zhen Li
  • Yizhou Yu
  • Shuguang Cui

Measuring and alleviating the discrepancies between the synthetic (source) and real scene (target) data is the core issue for domain adaptive semantic segmentation. Though recent works have introduced depth information in the source domain to reinforce the geometric and semantic knowledge transfer, they cannot extract the intrinsic 3D information of objects, including positions and shapes, merely based on 2D estimated depth. In this work, we propose a novel Geometry-Aware Network for Domain Adaptation (GANDA), leveraging more compact 3D geometric point cloud representations to shrink the domain gaps. In particular, we first utilize the auxiliary depth supervision from the source domain to obtain the depth prediction in the target domain to accomplish structure-texture disentanglement. Beyond depth estimation, we explicitly exploit 3D topology on the point clouds generated from RGB-D images for further coordinate-color disentanglement and pseudo-labels refinement in the target domain. Moreover, to improve the 2D classifier in the target domain, we perform domain-invariant geometric adaptation from source to target and unify the 2D semantic and 3D geometric segmentation results in two domains. Note that our GANDA is plug-and-play in any existing UDA framework. Qualitative and quantitative results demonstrate that our model outperforms state-of-the-arts on GTA5->Cityscapes and SYNTHIA->Cityscapes.

ICLR Conference 2023 Conference Paper

Learning Domain-Agnostic Representation for Disease Diagnosis

  • Churan Wang
  • Jing Li 0091
  • Xinwei Sun 0001
  • Fandong Zhang
  • Yizhou Yu
  • Yizhou Wang 0001

In clinical environments, image-based diagnosis is desired to achieve robustness on multi-center samples. Toward this goal, a natural way is to capture only clinically disease-related features. However, such disease-related features are often entangled with center-effect, disabling robust transferring to unseen centers/domains. To disentangle disease-related features, we first leverage structural causal modeling to explicitly model disease-related and center-effects that are provable to be disentangled from each other. Guided by this, we propose a novel Domain Agnostic Representation Model (DarMo) based on variational Auto-Encoder. To facilitate disentanglement, we design domain-agnostic and domain-aware encoders to respectively capture disease-related features and varied center-effects by incorporating a domain-aware batch normalization layer. Besides, we constrain the disease-related features to well predict the disease label as well as clinical attributes, by leveraging Graph Convolutional Network (GCN) into our decoder. The effectiveness and utility of our method are demonstrated by the superior performance over others on both public datasets and inhouse datasets.

ICLR Conference 2023 Conference Paper

Protein Representation Learning via Knowledge Enhanced Primary Structure Reasoning

  • Hongyu Zhou
  • Yunxiang Fu
  • Zhicheng Zhang
  • Cheng Bian
  • Yizhou Yu

Protein representation learning has primarily benefited from the remarkable development of language models (LMs). Accordingly, pre-trained protein models also suffer from a problem in LMs: a lack of factual knowledge. The recent solution models the relationships between protein and associated knowledge terms as the knowledge encoding objective. However, it fails to explore the relationships at a more granular level, i.e., the token level. To mitigate this, we propose Knowledge-exploited Auto-encoder for Protein (KeAP), which performs token-level knowledge graph exploration for protein representation learning. In practice, non-masked amino acids iteratively query the associated knowledge tokens to extract and integrate helpful information for restoring masked amino acids via attention. We show that KeAP can consistently outperform the previous counterpart on 9 representative downstream applications, sometimes surpassing it by large margins. These results suggest that KeAP provides an alternative yet effective way to perform knowledge enhanced protein representation learning.

AAAI Conference 2023 Conference Paper

RankDNN: Learning to Rank for Few-Shot Learning

  • Qianyu Guo
  • Gong Haotong
  • Xujun Wei
  • Yanwei Fu
  • Yizhou Yu
  • Wenqiang Zhang
  • Weifeng Ge

This paper introduces a new few-shot learning pipeline that casts relevance ranking for image retrieval as binary ranking relation classification. In comparison to image classification, ranking relation classification is sample efficient and domain agnostic. Besides, it provides a new perspective on few-shot learning and is complementary to state-of-the-art methods. The core component of our deep neural network is a simple MLP, which takes as input an image triplet encoded as the difference between two vector-Kronecker products, and outputs a binary relevance ranking order. The proposed RankMLP can be built on top of any state-of-the-art feature extractors, and our entire deep neural network is called the ranking deep neural network, or RankDNN. Meanwhile, RankDNN can be flexibly fused with other post-processing methods. During the meta test, RankDNN ranks support images according to their similarity with the query samples, and each query sample is assigned the class label of its nearest neighbor. Experiments demonstrate that RankDNN can effectively improve the performance of its baselines based on a variety of backbones and it outperforms previous state-of-the-art algorithms on multiple few-shot learning benchmarks, including miniImageNet, tieredImageNet, Caltech-UCSD Birds, and CIFAR-FS. Furthermore, experiments on the cross-domain challenge demonstrate the superior transferability of RankDNN.The code is available at: https://github.com/guoqianyu-alberta/RankDNN.

JBHI Journal 2022 Journal Article

3D Graph-Connectivity Constrained Network for Hepatic Vessel Segmentation

  • Ruikun Li
  • Yi-Jie Huang
  • Huai Chen
  • Xiaoqing Liu
  • Yizhou Yu
  • Dahong Qian
  • LIsheng Wang

Segmentation of hepatic vessels from 3D CT images is necessary for accurate diagnosis and preoperative planning for liver cancer. However, due to the low contrast and high noises of CT images, automatic hepatic vessel segmentation is a challenging task. Hepatic vessels are connected branches containing thick and thin blood vessels, showing an important structural characteristic or a prior: the connectivity of blood vessels. However, this is rarely applied in existing methods. In this paper, we segment hepatic vessels from 3D CT images by utilizing the connectivity prior. To this end, a graph neural network (GNN) used to describe the connectivity prior of hepatic vessels is integrated into a general convolutional neural network (CNN). Specifically, a graph attention network (GAT) is first used to model the graphical connectivity information of hepatic vessels, which can be trained with the vascular connectivity graph constructed directly from the ground truths. Second, the GAT is integrated with a lightweight 3D U-Net by an efficient mechanism called the plug-in mode, in which the GAT is incorporated into the U-Net as a multi-task branch and is only used to supervise the training procedure of the U-Net with the connectivity prior. The GAT will not be used in the inference stage, and thus will not increase the hardware and time costs of the inference stage compared with the U-Net. Therefore, hepatic vessel segmentation can be well improved in an efficient mode. Extensive experiments on two public datasets show that the proposed method is superior to related works in accuracy and connectivity of hepatic vessel segmentation.

AAAI Conference 2022 Conference Paper

A Causal Debiasing Framework for Unsupervised Salient Object Detection

  • Xiangru Lin
  • Ziyi Wu
  • Guanqi Chen
  • Guanbin Li
  • Yizhou Yu

Unsupervised Salient Object Detection (USOD) is a promising yet challenging task that aims to learn a salient object detection model without any ground-truth labels. Selfsupervised learning based methods have achieved remarkable success recently and have become the dominant approach in USOD. However, we observed that two distribution biases of salient objects limit further performance improvement of the USOD methods, namely, contrast distribution bias and spatial distribution bias. Concretely, contrast distribution bias is essentially a confounder that makes images with similar high-level semantic contrast and/or low-level visual appearance contrast spuriously dependent, thus forming data-rich contrast clusters and leading the training process biased towards the data-rich contrast clusters in the data. Spatial distribution bias means that the position distribution of all salient objects in a dataset is concentrated on the center of the image plane, which could be harmful to off-center objects prediction. This paper proposes a causal based debiasing framework to disentangle the model from the impact of such biases. Specifically, we use causal intervention to perform deconfounded model training to minimize the contrast distribution bias and propose an image-level weighting strategy that softly weights each image’s importance according to the spatial distribution bias map. Extensive experiments on 6 benchmark datasets show that our method significantly outperforms previous unsupervised state-of-the-art methods and even surpasses some of the supervised methods, demonstrating our debiasing framework’s effectiveness.

AAAI Conference 2022 Conference Paper

A Causal Inference Look at Unsupervised Video Anomaly Detection

  • Xiangru Lin
  • Yuyang Chen
  • Guanbin Li
  • Yizhou Yu

Unsupervised video anomaly detection, a task that requires no labeled normal/abnormal training data in any form, is challenging yet of great importance to both industrial applications and academic research. Existing methods typically follow an iterative pseudo label generation process. However, they lack a principled analysis of the impact of such pseudo label generation on training. Furthermore, the long-range temporal dependencies also has been overlooked, which is unreasonable since the definition of an abnormal event depends on the longrange temporal context. To this end, first, we propose a causal graph to analyze the confounding effect of the pseudo label generation process. Then, we introduce a simple yet effective causal inference based framework to disentangle the noisy pseudo label’s impact. Finally, we perform counterfactual based model ensemble that blends long-range temporal context with local image context in inference to make final anomaly detection. Extensive experiments on six standard benchmark datasets show that our proposed method significantly outperforms previous state-of-the-art methods, demonstrating our framework’s effectiveness.

ICML Conference 2022 Conference Paper

Disentangling Disease-related Representation from Obscure for Disease Prediction

  • Churan Wang
  • Fei Gao
  • Fandong Zhang
  • Fangwei Zhong
  • Yizhou Yu
  • Yizhou Wang 0001

Disease-related representations play a crucial role in image-based disease prediction such as cancer diagnosis, due to its considerable generalization capacity. However, it is still a challenge to identify lesion characteristics in obscured images, as many lesions are obscured by other tissues. In this paper, to learn the representations for identifying obscured lesions, we propose a disentanglement learning strategy under the guidance of alpha blending generation in an encoder-decoder framework (DAB-Net). Specifically, we take mammogram mass benign/malignant classification as an example. In our framework, composite obscured mass images are generated by alpha blending and then explicitly disentangled into disease-related mass features and interference glands features. To achieve disentanglement learning, features of these two parts are decoded to reconstruct the mass and the glands with corresponding reconstruction losses, and only disease-related mass features are fed into the classifier for disease prediction. Experimental results on one public dataset DDSM and three in-house datasets demonstrate that the proposed strategy can achieve state-of-the-art performance. DAB-Net achieves substantial improvements of 3. 9%~4. 4% AUC in obscured cases. Besides, the visualization analysis shows the model can better disentangle the mass and glands in the obscured image, suggesting the effectiveness of our solution in exploring the hidden characteristics in this challenging problem.

JBHI Journal 2022 Journal Article

GREN: Graph-Regularized Embedding Network for Weakly-Supervised Disease Localization in X-Ray Images

  • Baolian Qi
  • Gangming Zhao
  • Xin Wei
  • Changde Du
  • Chengwei Pan
  • Yizhou Yu
  • Jinpeng Li

Locating diseases in chest X-ray images with few careful annotations saves large human effort. Recent works approached this task with innovative weakly-supervised algorithms such as multi-instance learning (MIL) and class activation maps (CAM), however, these methods often yield inaccurate or incomplete regions. One of the reasons is the neglection of the pathological implications hidden in the relationship across anatomical regions within each image and the relationship across images. In this paper, we argue that the cross-region and cross-image relationship, as contextual and compensating information, is vital to obtain more consistent and integral regions. To model the relationship, we propose the Graph Regularized Embedding Network (GREN), which leverages the intra-image and inter-image information to locate diseases on chest X-ray images. GREN uses a pre-trained U-Net to segment the lung lobes, and then models the intra-image relationship between the lung lobes using an intra-image graph to compare different regions. Meanwhile, the relationship between in-batch images is modeled by an inter-image graph to compare multiple images. This process mimics the training and decision-making process of a radiologist: comparing multiple regions and images for diagnosis. In order for the deep embedding layers of the neural network to retain structural information (important in the localization task), we use the Hash coding and Hamming distance to compute the graphs, which are used as regularizers to facilitate training. By means of this, our approach achieves the state-of-the-art result on NIH chest X-ray dataset for weakly-supervised disease localization. Our codes are accessible online.

NeurIPS Conference 2022 Conference Paper

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization

  • Chaoqi Chen
  • Luyao Tang
  • Feng Liu
  • Gangming Zhao
  • Yue Huang
  • Yizhou Yu

Domain generalization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one. The general objective of DG methods is to learn semantic representations that are independent of domain labels, which is theoretically sound but empirically challenged due to the complex mixture of common and domain-specific factors. Although disentangling the representations into two disjoint parts has been gaining momentum in DG, the strong presumption over the data limits its efficacy in many real-world scenarios. In this paper, we propose Mix and Reason (MiRe), a new DG framework that learns semantic representations via enforcing the structural invariance of semantic topology. MiRe consists of two key components, namely, Category-aware Data Mixing (CDM) and Adaptive Semantic Topology Refinement (ASTR). CDM mixes two images from different domains in virtue of activation maps generated by two complementary classification losses, making the classifier focus on the representations of semantic objects. ASTR introduces relation graphs to represent semantic topology, which is progressively refined via the interactions between local feature aggregation and global cross-domain relational reasoning. Experiments on multiple DG benchmarks validate the effectiveness and robustness of the proposed MiRe.

JBHI Journal 2021 Journal Article

Curriculum Feature Alignment Domain Adaptation for Epithelium-Stroma Classification in Histopathological Images

  • Qi Qi
  • Xin Lin
  • Chaoqi Chen
  • Weiping Xie
  • Yue Huang
  • Xinghao Ding
  • Xiaoqing Liu
  • Yizhou Yu

In recent years, deep learning methods have received more attention in epithelial-stroma (ES) classification tasks. Traditional deep learning methods assume that the training and test data have the same distribution, an assumption that is seldom satisfied in complex imaging procedures. Unsupervised domain adaptation (UDA) transfers knowledge from a labelled source domain to a completely unlabeled target domain, and is more suitable for ES classification tasks to avoid tedious annotation. However, existing UDA methods for this task ignore the semantic alignment across domains. In this paper, we propose a Curriculum Feature Alignment Network (CFAN) to gradually align discriminative features across domains through selecting effective samples from the target domain and minimizing intra-class differences. Specifically, we developed the Curriculum Transfer Strategy (CTS) and Adaptive Centroid Alignment (ACA) steps to train our model iteratively. We validated the method using three independent public ES datasets, and experimental results demonstrate that our method achieves better performance in ES classification compared with commonly used deep learning methods and existing deep domain adaptation methods.

IJCAI Conference 2021 Conference Paper

Noise2Grad: Extract Image Noise to Denoise

  • Huangxing Lin
  • Yihong Zhuang
  • Yue Huang
  • Xinghao Ding
  • Xiaoqing Liu
  • Yizhou Yu

In many image denoising tasks, the difficulty of collecting noisy/clean image pairs limits the application of supervised CNNs. We consider such a case in which paired data and noise statistics are not accessible, but unpaired noisy and clean images are easy to collect. To form the necessary supervision, our strategy is to extract the noise from the noisy image to synthesize new data. To ease the interference of the image background, we use a noise removal module to aid noise extraction. The noise removal module first roughly removes noise from the noisy image, which is equivalent to excluding much background information. A noise approximation module can therefore easily extract a new noise map from the removed noise to match the gradient of the noisy input. This noise map is added to a random clean image to synthesize a new data pair, which is then fed back to the noise removal module to correct the noise removal process. These two modules cooperate to extract noise finely. After convergence, the noise removal module can remove noise without damaging other background details, so we use it as our final denoising network. Experiments show that the denoising performance of the proposed method is competitive with other supervised CNNs.

AAAI Conference 2019 Conference Paper

Non-Local Context Encoder: Robust Biomedical Image Segmentation against Adversarial Attacks

  • Xiang He
  • Sibei Yang
  • Guanbin Li
  • Haofeng Li
  • Huiyou Chang
  • Yizhou Yu

Recent progress in biomedical image segmentation based on deep convolutional neural networks (CNNs) has drawn much attention. However, its vulnerability towards adversarial samples cannot be overlooked. This paper is the first one that discovers that all the CNN-based state-of-the-art biomedical image segmentation models are sensitive to adversarial perturbations. This limits the deployment of these methods in safety-critical biomedical fields. In this paper, we discover that global spatial dependencies and global contextual information in a biomedical image can be exploited to defend against adversarial attacks. To this end, non-local context encoder (NLCE) is proposed to model short- and longrange spatial dependencies and encode global contexts for strengthening feature activations by channel-wise attention. The NLCE modules enhance the robustness and accuracy of the non-local context encoding network (NLCEN), which learns robust enhanced pyramid feature representations with NLCE modules, and then integrates the information across different levels. Experiments on both lung and skin lesion segmentation datasets have demonstrated that NLCEN outperforms any other state-of-the-art biomedical image segmentation methods against adversarial attacks. In addition, NLCE modules can be applied to improve the robustness of other CNN-based biomedical image segmentation methods.

NeurIPS Conference 2019 Conference Paper

Transductive Zero-Shot Learning with Visual Structure Constraint

  • Ziyu Wan
  • DongDong Chen
  • Yan Li
  • Xingguang Yan
  • Junge Zhang
  • Yizhou Yu
  • Jing Liao

To recognize objects of the unseen classes, most existing Zero-Shot Learning (ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, in real scenarios, the data distribution between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (\ie alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance, Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose a new training strategy to handle the real cases where many unrelated images exist in the test dataset, which is not considered in previous methods. Experiments on many widely used datasets demonstrate that the proposed visual structure constraint can bring substantial performance gain consistently and achieve state-of-the-art results.

IJCAI Conference 2016 Conference Paper

Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks

  • Zhen Li
  • Yizhou Yu

Protein secondary structure prediction is an important problem in bioinformatics. Inspired by the recent successes of deep neural networks, in this paper, we propose an end-to-end deep network that predicts protein secondary structures from integrated local and global contextual features. Our deep architecture leverages convolutional neural networks with different kernel sizes to extract multiscale local contextual features. In addition, considering long-range dependencies existing in amino acid sequences, we set up a bidirectional neural network consisting of gated recurrent unit to capture global contextual features. Furthermore, multi-task learning is utilized to predict secondary structure labels and amino-acid solvent accessibility simultaneously. Our proposed deep network demonstrates its effectiveness by achieving state-of-the-art performance, i. e. , 69. 7% Q8 accuracy on the public benchmark CB513, 76. 9% Q8 accuracy on CASP10 and 73. 1% Q8 accuracy on CASP11. Our model and results are publicly available.