Arrow Research search

Author name cluster

Xiaofeng Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers
2 author rows

Possible papers

28

AAAI Conference 2026 Conference Paper

CoFact: Dynamic Coordination of Attention Heads for Improving Factual Consistency in LLMs

  • Shike Li
  • Xiaokai Wang
  • Xiaofeng Liu
  • Xin Tong
  • Hu Zhang

Large language models (LLMs) frequently generate fluent yet factually inaccurate content, a phenomenon known as hallucination. Recent inference-time approaches aim to improve truthfulness by steering model activations toward semantically meaningful directions. While effective to some extent, these methods typically process activations independently, neglecting the internal coordination structure of multi-head attention (MHA), where attention heads interact to form semantic representations. In this work, we propose CoFact, an adaptive inference-time mechanism that improves factual consistency by dynamically coordinating attention head behaviors. Inspired by cooperative game theory, CoFact conceptualizes attention heads as collaborative agents. It models the semantic utility and redundancy of each head and adaptively modulates their contributions to the final attention output. Notably, rather than directly altering intermediate representations, CoFact performs token-level coordination to encourage diverse and complementary attention patterns across heads. CoFact is plug-and-play compatible with mainstream LLM architectures and requires no additional supervision or model retraining. Experimental results across multiple standard factuality benchmarks demonstrate that CoFact consistently enhances factual accuracy while maintaining generation fluency.

AAAI Conference 2026 Conference Paper

Explainable Depression Assessment from Face Videos by Weakly Supervised Learning

  • Rongfan Liao
  • Xiangyu Kong
  • Shiqing Tang
  • Lang He
  • Changzeng Fu
  • Weicheng Xie
  • Xiaofeng Liu
  • Lu Liu

Existing video-based automatic depression assessment (ADA) approaches frequently achieve video-level depression assessment by aggregating features or predictions of individual frames or equal-length segments within the given video. While their performances have been largely enhanced by recent advanced deep learning models, they typically fail to explicitly consider the varied importance of depression-related behavioural cues across different video segments, i.e., segments within one video may contain behaviours reflecting varying levels of depression. Underestimating segment-level variations can obscure the detection of facial behaviour cues associated with depression, thereby undermining the accuracy and interpretability of video-based depression detection systems. In this paper, we propose a novel video-based ADA approach that specifically identifies and differentiates video segments that exhibit depression-related facial behaviours across varying temporal durations, providing clear insights into how each segment contributes to the video-level depression prediction. To achieve this, a novel weakly supervised strategy is proposed to compare segment-level behaviours with video-level depression label, enabling the model to assign depression-relevant scores to multiple temporal scale video segments and attend selectively to those most indicative of depressive states. Extensive experiments on the AVEC 2013 and AVEC 2014 face video depression datasets demonstrate the effectiveness of our approach.

JBHI Journal 2026 Journal Article

Variance Extrapolated Class-Imbalance-Aware Domain Adaptive Myocardial Segmentation in Multi-Sequence Cardiac MRI

  • Fangxu Xing
  • Xiaofeng Liu
  • Iman Aganj
  • Georges El Fakhri
  • Panki Kim
  • Byoung Wook Choi
  • Jonghye Woo

Fully automated myocardial segmentation from cardiac magnetic resonance imaging (MRI) is vital for efficient diagnosis and treatment planning. Although numerous automated methods have been proposed, they typically focus on single MRI sequences and therefore have difficulties in generalizing across vendors and across cardiac MRI protocols. Simultaneous analysis of complementary cardiac MRI sequences, such as cine, T1 mapping, and late gadolinium enhancement (LGE) MRI, remains challenging due to their distinct image characteristics and scanner-specific variations. To address these issues, we propose an unsupervised domain adaptation approach that allows robust myocardial segmentation across multi-vendor cine, T1, and LGE MRI data. In particular, we introduce a class- imbalance self-training framework to transfer information learned from a source domain with labels to any unlabeled target domain, while maintaining consistent performance across different MRI sequences. Our framework iteratively refines segmentation accuracy by generating pseudo-labels for target data using a hardness-aware strategy, thus effectively addressing the problem of class imbalance in cardiac MRI segmentation. To mitigate data scarcity following pseudo-label selection, we employ a variance-guided vicinal feature extrapolation, which expands data points in the feature space into a probabilistic distribution. This, in turn, facilitates joint source–target training by generating a larger intersection in the feature space. Experimental results demonstrate that our framework outperforms existing methods when assessed using the Dice coefficient and Hausdorff distance. Our framework enables cardiac evaluation across MRI protocols without sequence-specific manual annotations.

IROS Conference 2025 Conference Paper

In-situ Value-aligned Human-Robot Interactions with Physical Constraints

  • Hongtao Li
  • Ziyuan Jiao
  • Xiaofeng Liu
  • Hangxin Liu
  • Zilong Zheng

Equipped with Large Language Models (LLMs), human-centered robots are now capable of performing a wide range of tasks that were previously deemed challenging or unattainable. However, merely completing tasks is insufficient for cognitive robots, who should learn and apply human preferences to future scenarios. In this work, we propose a framework that combines human preferences with physical constraints, requiring robots to complete tasks while considering both. Firstly, we developed a benchmark of everyday household activities, which are often evaluated based on specific preferences. We then introduced In-Context Learning from Human Feedback (ICLHF), where human feedback comes from direct instructions and adjustments made intentionally or unintentionally in daily life. Extensive sets of experiments, testing the ICLHF to generate task plans and balance physical constraints with preferences, have demonstrated the efficiency of our approach.

ICLR Conference 2025 Conference Paper

Progressive Compositionality in Text-to-Image Generative Models

  • Xu Han
  • Linghao Jin
  • Xiaofeng Liu
  • Paul Pu Liang

Despite the impressive text-to-image (T2I) synthesis capabilities of diffusion models, they often struggle to understand compositional relationships between objects and attributes, especially in complex settings. Existing approaches through building compositional architectures or generating difficult negative captions often assume a fixed prespecified compositional structure, which limits generalization to new distributions. In this paper, we argue that curriculum training is crucial to equipping generative models with a fundamental understanding of compositionality. To achieve this, we leverage large-language models (LLMs) to automatically compose complex scenarios and harness Visual-Question Answering (VQA) checkers to automatically curate a contrastive dataset, ConPair, consisting of 15k pairs of high-quality contrastive images. These pairs feature minimal visual discrepancies and cover a wide range of attribute categories, especially complex and natural scenarios. To learn effectively from these error cases (i.e., hard negative images), we propose EvoGen, a new multi-stage curriculum for contrastive learning of diffusion models. Through extensive experiments across a wide range of compositional scenarios, we showcase the effectiveness of our proposed framework on compositional T2I benchmarks.

NeurIPS Conference 2025 Conference Paper

Rethinking Evaluation of Infrared Small Target Detection

  • Youwei Pang
  • Xiaoqi Zhao
  • Lihe Zhang
  • Huchuan Lu
  • Georges Fakhri
  • Xiaofeng Liu
  • Shijian Lu

As an essential vision task, infrared small target detection (IRSTD) has seen significant advancements through deep learning. However, critical limitations in current evaluation protocols impede further progress. First, existing methods rely on fragmented pixel- and target-level specific metrics, which fails to provide a comprehensive view of model capabilities. Second, an excessive emphasis on overall performance scores obscures crucial error analysis, which is vital for identifying failure modes and improving real-world system performance. Third, the field predominantly adopts dataset-specific training-testing paradigms, hindering the understanding of model robustness and generalization across diverse infrared scenarios. This paper addresses these issues by introducing a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation. These aim to offer a more thorough and rational hierarchical analysis framework, ultimately fostering the development of more effective and robust IRSTD models. An open-source toolkit has be released to facilitate standardized benchmarking.

IJCAI Conference 2025 Conference Paper

Towards Robust Deterministic and Probabilistic Modeling for Predictive Learning

  • Xuesong Nie
  • Haoyuan Jin
  • Vijayakumar Bhagavatula
  • Xiaofeng Liu

Predictive modeling of unannotated spatiotemporal data presents inherent challenges, primarily due to the highly entangled visual dynamics in real-world scenes. To tackle these complexities, we introduce a novel insight through Disentangling Deterministic and Probabilistic (DDP) modeling. We note a key observation in spatiotemporal data where low-level details typically remain stable, whereas high-level motion frequently exhibits dynamic variations. The core motivation involves constructing two distinct pathways in the latent space: a deterministic path and a probabilistic path. The probabilistic path begins by defining the motion flow, which explicitly describes complex many-to-many motion patterns between patches, and models its probabilistic distribution using a motion diffuser. The deterministic path incorporates a spectral-aware enhancer to retain and amplify visual details in the frequency domain. These designs ensure visual consistency while also capturing intricate long-term motion dynamics. Extensive experiments demonstrate the superiority of DDP across diverse scenario evaluations.

NeurIPS Conference 2025 Conference Paper

UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation

  • Xiaoqi Zhao
  • Youwei Pang
  • Chenyang Yu
  • Lihe Zhang
  • Huchuan Lu
  • Shijian Lu
  • Georges Fakhri
  • Xiaofeng Liu

Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance. While existing methods address training-inference modality gaps via specialized per-combination models, they introduce high deployment costs by requiring exhaustive model subsets and model-modality matching. In this work, we propose a unified modality-relax segmentation network (UniMRSeg) through hierarchical self-supervised compensation (HSSC). Our approach hierarchically bridges representation gaps between complete and incomplete modalities across input, feature and output levels. First, we adopt modality reconstruction with the hybrid shuffled-masking augmentation, encouraging the model to learn the intrinsic modality characteristics and generate meaningful representations for missing modalities through cross-modal fusion. Next, modality-invariant contrastive learning implicitly compensates the feature space distance among incomplete-complete modality pairs. Furthermore, the proposed lightweight reverse attention adapter explicitly compensates for the weak perceptual semantics in the frozen encoder. Last, UniMRSeg is fine-tuned under the hybrid consistency constraint to ensure stable prediction under all modality combinations without large performance fluctuations. Without bells and whistles, UniMRSeg significantly outperforms the state-of-the-art methods under diverse missing modality scenarios on MRI-based brain tumor segmentation, RGB-D semantic segmentation, RGB-D/T salient object segmentation. The code will be released at \url{https: //github. com/Xiaoqi-Zhao-DLUT/UniMRSeg}.

JBHI Journal 2025 Journal Article

Unsupervised Domain Adaptation With Synchronized Self-Training for Cross- Domain Motor Imagery Recognition

  • Peiyin Chen
  • Xiaofeng Liu
  • Chao Ma
  • He Wang
  • Xiong Yang
  • Celso Grebogi
  • Xiao Gu
  • Zhongke Gao

Robust decoding performance is essential for the practical deployment of brain-computer interface (BCI) systems. Existing EEG decoding models often rely on large amounts of annotated data collected through specific experimental setups, which fail to address the heterogeneity of data distributions across different domains. This limitation hinders BCI systems from effectively managing the complexity and variability of real-world data. To overcome these challenges, we propose Synchronized Self-Training Domain Adaptation (SSTDA) for cross-domain motor imagery classification. Specifically, SSTDA leverages labeled signals from a source domain and applies self-training to unlabeled signals from a target domain, enabling the simultaneous training of a more robust classifier. The raw EEG signals are mapped into a latent space by a feature extractor for discriminative representation learning. A domain-shared latent space is then learned by optimizing the feature extractor with both source and target samples, using an easy-tohard self-training process. We validate the method with extensive experiments on two public motor imagery datasets: Dataset IIa of BCI Competition IV and the High Gamma dataset. In the inter-subject task, our method achieves classification accuracies of 64. 43% and 80. 40%, respectively. It also outperforms existing methods in the inter-session task. Moreover, we develope a new six-class motor imagery dataset and achieve test accuracies of 77. 09% and 80. 18% across different datasets. All experimental results demonstrate that our SSTDA outperforms existing algorithms in inter-session, inter-subject, and inter-dataset validation protocols, highlighting its capability to learn discriminative, domain-invariant representations that enhance EEG decoding performance.

NeurIPS Conference 2025 Conference Paper

When and How Unlabeled Data Provably Improve In-Context Learning

  • Yingcong Li
  • Xiangyu Chang
  • Muti Kara
  • Xiaofeng Liu
  • Amit Roy-Chowdhury
  • Samet Oymak

Recent research shows that in-context learning (ICL) can be effective even when demonstrations have missing or incorrect labels. To shed light on this capability, we examine a canonical setting where the demonstrations are drawn according to a binary Gaussian mixture model (GMM) and a certain fraction of the demonstrations have missing labels. We provide a comprehensive theoretical study to show that: (1) The loss landscape of one-layer linear attention models recover the optimal fully-supervised estimator but completely fail to exploit unlabeled data; (2) In contrast, multilayer or looped transformers can effectively leverage unlabeled data by implicitly constructing estimators of the form $\sum_{i\ge 0} a_i (X^\top X)^iX^\top y$ with $X$ and $y$ denoting features and partially-observed labels (with missing entries set to zero). We characterize the class of polynomials that can be expressed as a function of depth and draw connections to Expectation Maximization, an iterative pseudo-labeling algorithm commonly used in semi-supervised learning. Importantly, the leading polynomial power is exponential in depth, so mild amount of depth/looping suffices. As an application of theory, we propose looping off-the-shelf tabular foundation models to enhance their semi-supervision capabilities. Extensive evaluations on real-world datasets show that our method significantly improves the semisupervised tabular learning performance over the standard single pass inference.

AAAI Conference 2024 Conference Paper

Label-Efficient Few-Shot Semantic Segmentation with Unsupervised Meta-Training

  • Jianwu Li
  • Kaiyue Shi
  • Guo-Sen Xie
  • Xiaofeng Liu
  • Jian Zhang
  • Tianfei Zhou

The goal of this paper is to alleviate the training cost for few-shot semantic segmentation (FSS) models. Despite that FSS in nature improves model generalization to new concepts using only a handful of test exemplars, it relies on strong supervision from a considerable amount of labeled training data for base classes. However, collecting pixel-level annotations is notoriously expensive and time-consuming, and small-scale training datasets convey low information density that limits test-time generalization. To resolve the issue, we take a pioneering step towards label-efficient training of FSS models from fully unlabeled training data, or additionally a few labeled samples to enhance the performance. This motivates an approach based on a novel unsupervised meta-training paradigm. In particular, the approach first distills pre-trained unsupervised pixel embedding into compact semantic clusters from which a massive number of pseudo meta-tasks is constructed. To mitigate the noise in the pseudo meta-tasks, we further advocate a robust Transformer-based FSS model with a novel prototype-based cross-attention design. Extensive experiments have been conducted on two standard benchmarks, i.e., PASCAL-5i and COCO-20i, and the results show that our method produces impressive performance without any annotations, and is comparable to fully supervised competitors even using only 20% of the annotations. Our code is available at: https://github.com/SSSKYue/UMTFSS.

JBHI Journal 2022 Journal Article

Brain MR Atlas Construction Using Symmetric Deep Neural Inpainting

  • Fangxu Xing
  • Xiaofeng Liu
  • C.-C. Jay Kuo
  • Georges El Fakhri
  • Jonghye Woo

Modeling statistical properties of anatomical structures using magnetic resonance imaging is essential for revealing common information of a target population and unique properties of specific subjects. In brain imaging, a statistical brain atlas is often constructed using a number of healthy subjects. When tumors are present, however, it is difficult to either provide a common space for various subjects or align their imaging data due to the unpredictable distribution of lesions. Here we propose a deep learning-based image inpainting method to replace the tumor regions with normal tissue intensities using only a patient population. Our framework has three major innovations: 1) incompletely distributed datasets with random tumor locations can be used for training; 2) irregularly-shaped tumor regions are properly learned, identified, and corrected; and 3) a symmetry constraint between the two brain hemispheres is applied to regularize inpainted regions. Henceforth, regular atlas construction and image registration methods can be applied using inpainted data to obtain tissue deformation, thereby achieving group-specific statistical atlases and patient-to-atlas registration. Our framework was tested using the public database from the Multimodal Brain Tumor Segmentation challenge. Results showed increased similarity scores as well as reduced reconstruction errors compared with three existing image inpainting methods. Patient-to-atlas registration also yielded better results with improved normalized cross-correlation and mutual information and a reduced amount of deformation over the tumor regions.

JBHI Journal 2022 Journal Article

Interpreting Depression From Question-Wise Long-Term Video Recording of SDS Evaluation

  • Wanqing Xie
  • Lizhong Liang
  • Yao Lu
  • Chen Wang
  • Jihong Shen
  • Hui Luo
  • Xiaofeng Liu

Self-Rating Depression Scale (SDS) questionnaire has frequently been used for efficient depression preliminary screening. However, the uncontrollable self-administered measure can be easily affected by insouciantly or deceptively answering, and producing the different results with the clinician-administered Hamilton Depression Rating Scale (HDRS) and the final diagnosis. Clinically, facial expression (FE) and actions play a vital role in clinician-administered evaluation, while FE and action are underexplored for self-administered evaluations. In this work, we collect a novel dataset of 200 subjects to evidence the validity of self-rating questionnaires with their corresponding question-wise video recording. To automatically interpret depression from the SDS evaluation and the paired video, we propose an end-to-end hierarchical framework for the long-term variable-length video, which is also conditioned on the questionnaire results and the answering time. Specifically, we resort to a hierarchical model which utilizes a 3D CNN for local temporal pattern exploration and a redundancy-aware self-attention (RAS) scheme for question-wise global feature aggregation. Targeting for the redundant long-term FE video processing, our RAS is able to effectively exploit the correlations of each video clip within a question set to emphasize the discriminative information and eliminate the redundancy based on feature pair-wise affinity. Then, the question-wise video feature is concatenated with the questionnaire scores for final depression detection. Our thorough evaluations also show the validity of fusing SDS evaluation and its video recording, and the superiority of our framework to the conventional state-of-the-art temporal modeling methods.

JBHI Journal 2022 Journal Article

VoxelHop: Successive Subspace Learning for ALS Disease Classification Using Structural MRI

  • Xiaofeng Liu
  • Fangxu Xing
  • Chao Yang
  • Chung-Chieh Jay Kuo
  • Suma Babu
  • Georges El Fakhri
  • Thomas Jenkins
  • Jonghye Woo

Deep learning has great potential for accurate detection and classification of diseases with medical imaging data, but the performance is often limited by the number of training datasets and memory requirements. In addition, many deep learning models are considered a “black-box, ” thereby often limiting their adoption in clinical applications. To address this, we present a successive subspace learning model, termed VoxelHop, for accurate classification of Amyotrophic Lateral Sclerosis (ALS) using T2-weighted structural MRI data. Compared with popular convolutional neural network (CNN) architectures, VoxelHop has modular and transparent structures with fewer parameters without any backpropagation, so it is well-suited to small dataset size and 3D imaging data. Our VoxelHop has four key components, including (1) sequential expansion of near-to-far neighborhood for multi-channel 3D data; (2) subspace approximation for unsupervised dimension reduction; (3) label-assisted regression for supervised dimension reduction; and (4) concatenation of features and classification between controls and patients. Our experimental results demonstrate that our framework using a total of 20 controls and 26 patients achieves an accuracy of 93. 48 $\%$ and an AUC score of 0. 9394 in differentiating patients from controls, even with a relatively small number of datasets, showing its robustness and effectiveness. Our thorough evaluations also show its validity and superiority to the state-of-the-art 3D CNN classification approaches. Our framework can easily be generalized to other classification tasks using different imaging modalities.

JBHI Journal 2021 Journal Article

A Hierarchical Graph Convolution Network for Representation Learning of Gene Expression Data

  • Kaiwen Tan
  • Weixian Huang
  • Xiaofeng Liu
  • Jinlong Hu
  • Shoubin Dong

The curse of dimensionality, which is caused by high-dimensionality and low-sample-size, is a major challenge in gene expression data analysis. However, the real situation is even worse: labelling data is laborious and time-consuming, so only a small part of the limited samples will be labelled. Having such few labelled samples further increases the difficulty of training deep learning models. Interpretability is an important requirement in biomedicine. Many existing deep learning methods are trying to provide interpretability, but rarely apply to gene expression data. Recent semi-supervised graph convolution network methods try to address these problems by smoothing the label information over a graph. However, to the best of our knowledge, these methods only utilize graphs in either the feature space or sample space, which restrict their performance. We propose a transductive semi-supervised representation learning method called a hierarchical graph convolution network (HiGCN) to aggregate the information of gene expression data in both feature and sample spaces. HiGCN first utilizes external knowledge to construct a feature graph and a similarity kernel to construct a sample graph. Then, two spatial-based GCNs are used to aggregate information on these graphs. To validate the model's performance, synthetic and real datasets are provided to lend empirical support. Compared with two recent models and three traditional models, HiGCN learns better representations of gene expression data, and these representations improve the performance of downstream tasks, especially when the model is trained on a few labelled samples. Important features can be extracted from our model to provide reliable interpretability.

AAAI Conference 2021 Conference Paper

Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models

  • Tong Che
  • Xiaofeng Liu
  • Site Li
  • Yubin Ge
  • Ruixiang Zhang
  • Caiming Xiong
  • Yoshua Bengio

AI Safety is a major concern in many deep learning applications such as autonomous driving. Given a trained deep learning model, an important natural problem is how to reliably verify the model’s prediction. In this paper, we propose a novel framework — deep verifier networks (DVN) to detect unreliable inputs or predictions of deep discriminative models, using separately trained deep generative models. Our proposed model is based on the concise conditional variational auto-encoders with disentanglement constraints to separate the label information from the latent representation. We give both intuitive and theoretical justifications for the model. Our verifier network is trained independently with the prediction model, which eliminates the need of retraining the verifier network for a new model. We test the verifier network on both out-of-distribution detection and adversarial example detection problems, as well as anomaly detection problems in structured prediction tasks such as image caption generation. We achieve state-of-the-art results in all of these problems.

IJCAI Conference 2021 Conference Paper

Domain Generalization under Conditional and Label Shifts via Variational Bayesian Inference

  • Xiaofeng Liu
  • Bo Hu
  • Linghao Jin
  • Xu Han
  • Fangxu Xing
  • Jinsong Ouyang
  • Jun Lu
  • Georges El Fakhri

In this work, we propose a domain generalization (DG) approach to learn on several labeled source domains and transfer knowledge to a target domain that is inaccessible in training. Considering the inherent conditional and label shifts, we would expect the alignment of p(x|y) and p(y). However, the widely used domain invariant feature learning (IFL) methods relies on aligning the marginal concept shift w. r. t. p(x), which rests on an unrealistic assumption that p(y) is invariant across domains. We thereby propose a novel variational Bayesian inference framework to enforce the conditional distribution alignment w. r. t. p(x|y) via the prior distribution matching in a latent space, which also takes the marginal label shift w. r. t. p(y) into consideration with the posterior alignment. Extensive experiments on various benchmarks demonstrate that our framework is robust to the label shift and the cross-domain accuracy is significantly improved, thereby achieving superior performance over the conventional IFL counterparts.

IROS Conference 2021 Conference Paper

PNS: Population-Guided Novelty Search for Reinforcement Learning in Hard Exploration Environments

  • Qihao Liu
  • Yujia Wang
  • Xiaofeng Liu

Reinforcement Learning (RL) has made remarkable achievements, but it still suffers from inadequate exploration strategies, sparse reward signals, and deceptive reward functions. To alleviate these problems, a Population-guided Novelty Search (PNS) parallel learning method is proposed in this paper. In PNS, the population is divided into multiple sub-populations, each of which has one chief agent and several exploring agents. The chief agent evaluates the policies learned by exploring agents and shares the optimal policy with all sub-populations. The exploring agents learn their policies in collaboration with the guidance of the optimal policy and, simultaneously, upload their policies to the chief agent. To balance exploration and exploitation, the Novelty Search (NS) is employed in every chief agent to encourage policies with high novelty while maximizing per-episode performance. We apply PNS to the twin delayed deep deterministic (TD3) policy gradient algorithm. The effectiveness of PNS to promote exploration and improve performance in continuous control domains is demonstrated in the experimental section. Notably, PNS-TD3 achieves rewards that far exceed the SOTA methods in environments with sparse or delayed reward signals. We also demonstrate that PNS enables robotic agents to learn control policies directly from pixels for sparse-reward manipulation in both simulated and real-world settings.

AAAI Conference 2021 Conference Paper

Subtype-aware Unsupervised Domain Adaptation for Medical Diagnosis

  • Xiaofeng Liu
  • Xiongchang Liu
  • Bo Hu
  • Wenxuan Ji
  • Fangxu Xing
  • Jun Lu
  • Jane You
  • C.-C. Jay Kuo

Recent advances in unsupervised domain adaptation (UDA) show that transferable prototypical learning presents a powerful means for class conditional alignment, which encourages the closeness of cross-domain class centroids. However, the cross-domain inner-class compactness and the underlying fine-grained subtype structure remained largely underexplored. In this work, we propose to adaptively carry out the fine-grained subtype-aware alignment by explicitly enforcing the class-wise separation and subtype-wise compactness with intermediate pseudo labels. Our key insight is that the unlabeled subtypes of a class can be divergent to one another with different conditional and label shifts, while inheriting the local proximity within a subtype. The cases with or without the prior information on subtype numbers are investigated to discover the underlying subtype structure in an online fashion. The proposed subtype-aware dynamic UDA achieves promising results on a medical diagnosis task.

AAAI Conference 2020 Conference Paper

Importance-Aware Semantic Segmentation in Self-Driving with Discrete Wasserstein Training

  • Xiaofeng Liu
  • Yuzhuo Han
  • Song Bai
  • Yi Ge
  • Tianxing Wang
  • Xu Han
  • Site Li
  • Jane You

Semantic segmentation (SS) is an important perception manner for self-driving cars and robotics, which classifies each pixel into a pre-determined class. The widely-used cross entropy (CE) loss-based deep networks has achieved significant progress w. r. t. the mean Intersection-over Union (mIoU). However, the cross entropy loss can not take the different importance of each class in an self-driving system into account. For example, pedestrians in the image should be much more important than the surrounding buildings when make a decisions in the driving, so their segmentation results are expected to be as accurate as possible. In this paper, we propose to incorporate the importance-aware inter-class correlation in a Wasserstein training framework by configuring its ground distance matrix. The ground distance matrix can be pre-defined following a priori in a specific task, and the previous importance-ignored methods can be the particular cases. From an optimization perspective, we also extend our ground metric to a linear, convex or concave increasing function w. r. t. pre-defined ground distance. We evaluate our method on CamVid and Cityscapes datasets with different backbones (SegNet, ENet, FCN and Deeplab) in a plug and play fashion. In our extenssive experiments, Wasserstein loss demonstrates superior segmentation performance on the predefined critical classes for safe-driving.