Author name cluster

Xin Gao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

53 papers

2 author rows

TMLR Journal 2026 Journal Article

GGFlow: A Graph Flow Matching Method with Efficient Optimal Transport

Xiaoyang Hou
Tian Zhu
Milong Ren
Dongbo Bu
Xin Gao
Chunming Zhang
Shiwei Sun

Generating graph-structured data is crucial in various domains but remains challenging due to the complex interdependencies between nodes and edges. While diffusion models have demonstrated their superior generative capabilities, they often suffer from unstable training and inefficient sampling. To enhance generation performance and training stability, we propose GGFlow, a discrete flow matching generative model incorporating an efficient optimal transport for graph structures and it incorporates an edge-augmented graph transformer to enable direct communications among edges. Additionally, GGFlow introduces a novel goal-guided generation framework to control the generative trajectory of our model towards desired properties. GGFlow demonstrates superior performance on both unconditional and conditional generation tasks, outperforming existing baselines and underscoring its effectiveness and potential for wider application.

JBHI Journal 2026 Journal Article

NoTAC: A Noise-Tolerance Automatic Cleaning Framework for Bone Marrow Karyotyping Data

Rihan Huang
Siyuan Chen
Yafei Li
Chunling Zhang
Yilan Zhang
Changchun Yang
Na Li
Jingdong Hu

Deep neural networks have advanced chromosome classification, a critical procedure in karyotyping for disease diagnosis. However, training an effective DNN requires clean and reliable data, whereas real-world clinical chromosome data often contain label errors and outliers, which degrade DNN performance and limit their clinical applicability. In this work, we propose a Noise-Tolerance Automatic Cleaning framework, named NoTAC, to address potential labeling errors and outliers to enhance the performance of chromosome classification. The framework consists of two branches: KaryoCleanse for label noise detection and KaryoDrift for outlier identification. First, it identifies potential label errors by leveraging the DNN’s self-confidence, estimating the latent label distribution, and ranking probabilities to prune mislabeled data. Second, it scores out-of-distribution samples based on the average K-nearest neighbor distances, enabling the identification and removal of outlier data. We conducted comprehensive comparative experiments against state-of-the-art noise-handling methods on a real-world R-band bone marrow chromosome dataset. Our results demonstrate that NoTAC achieves superior performance with an accuracy of 93. 99%, which represents a 6. 25% relative improvement over the baseline and outperforms the best competing method by 0. 92%. Furthermore, our qualitative analysis of NoTAC revealed reliable data issues in a real-world R-band bone marrow chromosome dataset, offering insights into how these issues impair DNN prediction capabilities. These findings demonstrate NoTAC’s potential to enhance both the performance and reliability of DNNs in practical medical datasets. The proposed method has also been applied to assist clinical karyotype diagnosis.

JBHI Journal 2026 Journal Article

PASAformer: Cerebrovascular Disease Classification with Medical Prior-Guided Adapter and Pathology-Aware Sparse Attention

Baiming Chen
Xin Gao
Weiguo Zhang
Sue Cao
Si Li
Linhai Yan

Cerebrovascular diseases (CVDs) such as aneurysms, arteriovenous malformations, stenosis, and Moyamoya disease are major public health concerns. Accurate classification of these conditions is essential for timely intervention, yet current computer-aided methods often exhibit limited representational capacity, feature redundancy, and insufficient interpretability, restricting clinical applicability. We propose PASAformer, a Swin-Transformer-based framework for cerebrovascular disease classification on Digital Subtraction Angiography (DSA). PASAformer incorporates a Pathology-Aware Sparse Attention (PASA) module that emphasizes lesion-related regions while suppressing background redundancy. Inserted into the Swin backbone, PASA replaces dense window self-attention, improving computational efficiency while preserving the hierarchical architecture. We further employ the MiAMix data augmenter to increase sample diversity, and incorporate a CombinedAdapter encoder that injects anatomical priors from the frozen Medical Segment Anything Model (MED-SAM) into early-stage representations, strengthening discriminative power under limited supervision. To support research in this underexplored area, we curate CDSA-NEO, a proprietary DSA dataset comprising more than 1, 700 static images across four major cerebrovascular disease categories, constituting the first large-scale benchmark of its kind. Furthermore, an external cohort of angiographic runs with sequential, unselected frames is used to assess robustness in realistic temporal workflows. Extensive experiments on CDSA-NEO and public vascular datasets demonstrate that PASAformer achieves competitive precision and balanced accuracy compared to representative state-of-the-art models, while providing more focused visual explanations. These results suggest that PASAformer can support automated cerebrovascular disease classification on angiography, and that CDSA-NEO provides a benchmark for future method development and evaluation.

AAAI Conference 2026 Conference Paper

Spatial-Frequency Spiking Neural Network for Underwater Object Detection

Long Chen
Wei Miao
Xin Gao
Yunzhi Zhuge
Hongming Xu
Yaxin Li
Qi Xu

Underwater object detection presents significant challenges due to the unique visual degradations in underwater environments, such as low contrast, poor visibility, and blurry object boundaries. While ANNs have achieved impressive detection accuracy, their high computational cost and power consumption limit their deployment in resource-constrained underwater platforms. In this work, we propose a Spatial-Frequency Spiking Neural Network (SFSNN) that combines the energy-efficient and event-driven nature of Spiking Neural Networks (SNNs) with the discriminative power of spatial-frequency analysis. SFSNN introduces a novel spatial-frequency spiking module that integrates spatial and frequency-domain representations, enhancing edge and texture features crucial for object detection in murky waters. Furthermore, we adapt the YOLOX architecture into a spike-based detector via ANN-to-SNN conversion using signed spiking neurons. Extensive experiments on the RUOD dataset demonstrate that SFSNN achieves superior performance over both SNN- and ANN-based detection models, offering a compelling solution for low-power underwater object detection.

PDF Details DOI

EAAI Journal 2025 Journal Article

A feature matching-based method for few-shot multivariate time series anomaly detection with symmetric patch mask Siam Transformer

Jiahao Yu
Xin Gao
Taizhi Wang
Heping Lu
Baofeng Li
Feng Zhai
Bing Xue
Zhihang Meng

AAAI Conference 2025 Conference Paper

A Trusted Lesion-assessment Network for Interpretable Diagnosis of Coronary Artery Disease in Coronary CT Angiography

Xinghua Ma
Xinyan Fang
Mingye Zou
Gongning Luo
Wei Wang
Kuanquan Wang
Zhaowen Qiu
Xin Gao

Coronary Artery Disease (CAD) poses a significant threat to cardiovascular patients worldwide, underscoring the critical importance of automated CAD diagnostic technologies in clinical practice. Previous technologies for lesion assessment in Coronary CT Angiography (CCTA) images have been insufficient in terms of interpretability, resulting in solutions that lack clinical reliability in both network architecture and prediction outcomes, even when diagnoses are accurate. To address the limitation of interpretability, we introduce the Trusted Lesion-Assessment Network (TLA-Net), which provides a clinically reliable solution for multi-view CAD diagnosis: (1) The causality-informed evidence collection constructs a causal graph for the diagnostic process and implements causal interventions, preventing confounders' interference and enhancing the transparency of the network architecture. (2) The clinically-aligned uncertainty integration hierarchically combines Dirichlet distributions from various views based on clinical priors, offering confidence coefficients for prediction outcomes that align with physicians' image analysis procedures. Experimental results on a dataset of 2,618 lesions demonstrate that TLA-Net, supported by its interpretable methodological design, exhibits superior performance with outstanding generalization, domain adaptability, and robustness.

PDF Details DOI

EAAI Journal 2025 Journal Article

An unsupervised anatomy-aware dual-constraint cascade network for lung computed tomography deformable image registration

Wenbin Wu
Yifan Gao
Xin Jin
Rui Zhang
Yuemei Pan
Xin Gao

NeurIPS Conference 2025 Conference Paper

ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibiltiy Data

Yifeng Jiao
Yuchen Liu
Yu Zhang
Xin Guo
Yushuai Wu
Chen Jiang
Jiyang Li
Hongwei Zhang

The advent of single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) offers an innovative perspective for deciphering regulatory mechanisms by assembling a vast repository of single-cell chromatin accessibility data. While foundation models have achieved significant success in single-cell transcriptomics, there is currently no foundation model for scATAC-seq that supports zero-shot high-quality cell identification and comprehensive multi-omics analysis simultaneously. Key challenges lie in the high dimensionality and sparsity of scATAC-seq data, as well as the lack of a standardized schema for representing open chromatin regions (OCRs). Here, we present ChromFound, a foundation model tailored for scATAC-seq. ChromFound utilizes a hybrid architecture and genome-aware tokenization to effectively capture genome-wide long contexts and regulatory signals from dynamic chromatin landscapes. Pretrained on 1. 97 million cells from 30 tissues and 6 disease conditions, ChromFound demonstrates broad applicability across 6 diverse tasks. Notably, it achieves robust zero-shot performance in generating universal cell representations and exhibits excellent transferability in cell type annotation and cross-omics prediction. By uncovering enhancer-gene links undetected by existing computational methods, ChromFound offers a promising framework for understanding disease risk variants in the noncoding genome. The implementation of ChromFound is available via https: //github. com/JohnsonKlose/ChromFound.

ICLR Conference 2025 Conference Paper

Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs

Xin Gao
Jian Pu

Multi-View Representation Learning (MVRL) aims to derive a unified representation from multi-view data by leveraging shared and complementary information across views. However, when views are irregularly missing, the incomplete data can lead to representations that lack sufficiency and consistency. To address this, we propose Multi-View Permutation of Variational Auto-Encoders (MVP), which excavates invariant relationships between views in incomplete data. MVP establishes inter-view correspondences in the latent space of Variational Auto-Encoders, enabling the inference of missing views and the aggregation of more sufficient information. To derive a valid Evidence Lower Bound (ELBO) for learning, we apply permutations to randomly reorder variables for cross-view generation and then partition them by views to maintain invariant meanings under permutations. Additionally, we enhance consistency by introducing an informational prior with cyclic permutations of posteriors, which turns the regularization term into a similarity measure across distributions. We demonstrate the effectiveness of our approach on seven diverse datasets with varying missing ratios, achieving superior performance in multi-view clustering and generation tasks.

ICLR Conference 2025 Conference Paper

DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing

Xinyu Ma
Yifeng Xu
Yang Lin
Tianlong Wang
Xu Chu
Xin Gao
Junfeng Zhao 0001
Yasha Wang

We introduce DRESS, a novel approach for generating stylized large language model (LLM) responses through representation editing. Existing methods like prompting and fine-tuning are either insufficient for complex style adaptation or computationally expensive, particularly in tasks like NPC creation or character role-playing. Our approach leverages the over-parameterized nature of LLMs to disentangle a style-relevant subspace within the model's representation space to conduct representation editing, ensuring a minimal impact on the original semantics. By applying adaptive editing strengths, we dynamically adjust the steering vectors in the style subspace to maintain both stylistic fidelity and semantic integrity. We develop two stylized QA benchmark datasets to validate the effectiveness of DRESS, and the results demonstrate significant improvements compared to baseline methods such as prompting and ITI. In short, DRESS is a lightweight, train-free solution for enhancing LLMs with flexible and effective style control, making it particularly useful for developing stylized conversational agents. Codes and benchmark datasets are available at https://github.com/ArthurLeoM/DRESS-LLM.

IROS Conference 2025 Conference Paper

Dynamic Residual Safe Reinforcement Learning for Multi-Agent Safety-Critical Scenarios Decision-Making

Kaifeng Wang
Yinsong Chen
Qi Liu
Xueyuan Li
Xin Gao

In multi-agent safety-critical scenarios, traditional autonomous driving frameworks face significant challenges in balancing safety constraints and task performance. These frameworks struggle to quantify dynamic interaction risks in real-time and depend heavily on manual rules, resulting in low computational efficiency and conservative strategies. To address these limitations, we propose a Dynamic Residual Safe Reinforcement Learning (DRS-RL) framework grounded in a safety-enhanced networked Markov decision process. It’s the first time that the weak-to-strong theory is introduced into multi-agent decision-making, enabling lightweight dynamic calibration of safety boundaries via a weak-to-strong safety correction paradigm. Based on the multi-agent dynamic conflict zone model, our framework accurately captures spatiotemporal coupling risks among heterogeneous traffic participants and surpasses the static constraints of conventional geometric rules. Moreover, a risk-aware prioritized experience replay mechanism mitigates data distribution bias by mapping risk to sampling probability. Experimental results reveal that the proposed method significantly outperforms traditional RL algorithms in safety, efficiency, and comfort. Specifically, it reduces the collision rate by up to 92. 17%, while the safety model accounts for merely 27% of the main model’s parameters.

JBHI Journal 2025 Journal Article

Effects of Different Preprocessing Pipelines on Motor Imagery-Based Brain-Computer Interfaces

Xin Gao
Kai Gui
Xiaolong Wu
Benjamin Metcalfe
Dingguo Zhang

In recent years, brain-computer interfaces (BCIs) leveraging electroencephalography (EEG) signals for the control of external devices have garnered increasing attention. The information transfer rate of BCI has been significantly improved by a lot of cutting-edge methods. The exploration of effective preprocessing in brain-computer interfaces, particularly in terms of identifying suitable preprocessing methods and determining the optimal sequence for their application, remains an area ripe for further investigation. To address this gap, this study explores a range of preprocessing techniques, including but not limited to independent component analysis, surface Laplacian, bandpass filtering, and baseline correction, examining their potential contributions and synergies in the context of BCI applications. In this extensive research, a variety of preprocessing pipelines were rigorously tested across four EEG data sets, all of which were pertinent to motor imagery-based BCIs. These tests incorporated five EEG machine learning models, working in tandem with the preprocessing methods discussed earlier. The study's results highlighted that baseline correction and bandpass filtering consistently provided the most beneficial preprocessing effects. From the perspective of online deployment, after testing and time complexity analysis, this study recommends baseline correction, bandpass filtering and surface Laplace as more suitable for online implementation. An interesting revelation of the study was the enhanced effectiveness of the surface Laplacian algorithm when used alongside algorithms that focus on spatial information. Using appropriate processing algorithms, we can even achieve results (92. 91% and 88. 11%) that exceed the SOTA feature extraction methods in some cases. Such findings are instrumental in offering critical insights for the selection of effective preprocessing pipelines in EEG signal decoding. This, in turn, contributes to the advancement and refinement of brain-computer interface technologies.

NeurIPS Conference 2025 Conference Paper

From Pretraining to Pathology: How Noise Leads to Catastrophic Inheritance in Medical Models

Hao Sun
Zhongyi Han
Hao Chen
Jindong Wang
Xin Gao
Yilong Yin

Foundation models pretrained on web-scale data drive contemporary transfer learning in vision, language, and multimodal tasks. Recent work shows that mild label noise in these corpora may lift in-distribution accuracy yet sharply reduce out-of-distribution generalization, an effect known as catastrophic inheritance. Medical data is especially sensitive because annotations are scarce, domain shifts are large, and pretraining sources are noisy. We present the first systematic analysis of catastrophic inheritance in medical models. Controlled label-corruption experiments expose a clear structural collapse: as noise rises, the skewness and kurtosis of feature and logit distributions decline, signaling a flattened representation space and diminished discriminative detail. These higher-order statistics form a compact, interpretable marker of degradation in fine-grained tasks such as histopathology. Guided by this finding, we introduce a fine-tuning objective that restores skewness and kurtosis through two scalar regularizers added to the task loss. The method leaves the backbone unchanged and incurs negligible overhead. Tests on PLIP models trained with Twitter pathology images, as well as other large-scale vision and language backbones, show consistent gains in robustness and cross-domain accuracy under varied noise levels.

NeurIPS Conference 2025 Conference Paper

GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection

Xin Gao
Jiyao Liu
Guanghao Li
Yueming LYU
Jianxiong Gao
Weichen Yu
Ningsheng Xu
Liang Wang

Recent advancements have explored text-to-image diffusion models for synthesizing out-of-distribution (OOD) samples, substantially enhancing the performance of OOD detection. However, existing approaches typically rely on perturbing text-conditioned embeddings, resulting in semantic instability and insufficient shift diversity, which limit generalization to realistic OOD. To address these challenges, we propose GOOD, a novel and flexible framework that directly guides diffusion sampling trajectories towards OOD regions using off-the-shelf in-distribution (ID) classifiers. GOOD incorporates dual-level guidance: (1) Image-level guidance based on the gradient of log partition to reduce input likelihood, drives samples toward low-density regions in pixel space. (2) Feature-level guidance, derived from k-NN distance in the classifier’s latent space, promotes sampling in feature-sparse regions. Hence, this dual-guidance design enables more controllable and diverse OOD sample generation. Additionally, we introduce a unified OOD score that adaptively combines image and feature discrepancies, enhancing detection robustness. We perform thorough quantitative and qualitative analyses to evaluate the effectiveness of GOOD, demonstrating that training with samples generated by GOOD can notably enhance OOD detection performance.

NeurIPS Conference 2025 Conference Paper

Learning Spatial-Aware Manipulation Ordering

Yuxiang Yan
Zhiyuan Zhou
Xin Gao
Guanghao Li
Shenglin Li
Jiaqi Chen
Qunyan Pu
Jian Pu

Manipulation in cluttered environments is challenging due to spatial dependencies among objects, where an improper manipulation order can cause collisions or blocked access. Existing approaches often overlook these spatial relationships, limiting their flexibility and scalability. To address these limitations, we propose OrderMind, a unified spatial-aware manipulation ordering framework that directly learns object manipulation priorities based on spatial context. Our architecture integrates a spatial context encoder with a temporal priority structuring module. We construct a spatial graph using k-Nearest Neighbors to aggregate geometric information from the local layout and encode both object-object and object-manipulator interactions to support accurate manipulation ordering in real-time. To generate physically and semantically plausible supervision signals, we introduce a spatial prior labeling method that guides a vision-language model to produce reasonable manipulation orders for distillation. We evaluate OrderMind on our Manipulation Ordering Benchmark, comprising 163, 222 samples of varying difficulty. Extensive experiments in both simulation and real-world environments demonstrate that our method significantly outperforms prior approaches in effectiveness and efficiency, enabling robust manipulation in cluttered scenes.

NeurIPS Conference 2025 Conference Paper

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

Honglin Lin
Qizhi Pei
Zhuoshi Pan
Yu Li
Xin Gao
Juntao Li
Conghui He
Lijun Wu

Reasoning capability is pivotal for Large Language Models (LLMs) to solve complex tasks, yet achieving reliable and scalable reasoning remains challenging. While Chain-of-Thought (CoT) prompting has become a mainstream approach, existing methods often suffer from uncontrolled generation, insufficient quality, and limited diversity in reasoning paths. Recent efforts leverage code to enhance CoT by grounding reasoning in executable steps, but such methods are typically constrained to predefined mathematical problems, hindering scalability and generalizability. In this work, we propose \texttt{Caco} (Code-Assisted Chain-of-ThOught), a novel framework that automates the synthesis of high-quality, verifiable, and diverse instruction-CoT reasoning data through code-driven augmentation. Unlike prior work, \texttt{Caco} first fine-tunes a code-based CoT generator on existing math and programming solutions in a unified code format, then scales the data generation to a large amount of diverse reasoning traces. Crucially, we introduce automated validation via code execution and rule-based filtering to ensure logical correctness and structural diversity, followed by reverse-engineering filtered outputs into natural language instructions and language CoTs to enrich task adaptability. This closed-loop process enables fully automated, scalable synthesis of reasoning data with guaranteed executability. Experiments on our created \texttt{Caco}-1. 3M dataset demonstrate that \texttt{Caco}-trained models achieve strong competitive performance on mathematical reasoning benchmarks, outperforming existing strong baselines. Further analysis reveals that \texttt{Caco}’s code-anchored verification and instruction diversity contribute to superior generalization across unseen tasks. Our work establishes a paradigm for building self-sustaining, trustworthy reasoning systems without human intervention.

NeurIPS Conference 2024 Conference Paper

CausalStock: Deep End-to-end Causal Discovery for News-driven Multi-stock Movement Prediction

Shuqi Li
Yuebo Sun
Yuxin Lin
Xin Gao
Shuo Shang
Rui Yan

There are two issues in news-driven multi-stock movement prediction tasks that are not well solved in the existing works. On the one hand, "relation discovery" is a pivotal part when leveraging the price information of other stocks to achieve accurate stock movement prediction. Given that stock relations are often unidirectional, such as the "supplier-consumer" relationship, causal relations are more appropriate to capture the impact between stocks. On the other hand, there is substantial noise existing in the news data leading to extracting effective information with difficulty. With these two issues in mind, we propose a novel framework called CausalStock for news-driven multi-stock movement prediction, which discovers the temporal causal relations between stocks. We design a lag-dependent temporal causal discovery mechanism to model the temporal causal graph distribution. Then a Functional Causal Model is employed to encapsulate the discovered causal relations and predict the stock movements. Additionally, we propose a Denoised News Encoder by taking advantage of the excellent text evaluation ability of large language models (LLMs) to extract useful information from massive news data. The experiment results show that CausalStock outperforms the strong baselines for both news-driven multi-stock movement prediction and multi-stock movement prediction tasks on six real-world datasets collected from the US, China, Japan, and UK markets. Moreover, getting benefit from the causal relations, CausalStock could offer a clear prediction mechanism with good explainability.

PDF Details DOI

ICML Conference 2024 Conference Paper

Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation

Xinyu Ma
Xu Chu
Zhibang Yang
Yang Lin
Xin Gao
Junfeng Zhao 0001

With the increasingly powerful performances and enormous scales of pretrained models, promoting parameter efficiency in fine-tuning has become a crucial need for effective and efficient adaptation to various downstream tasks. One representative line of fine-tuning methods is Orthogonal Fine-tuning (OFT), which rigorously preserves the angular distances within the parameter space to preserve the pretrained knowledge. Despite the empirical effectiveness, OFT still suffers low parameter efficiency at $\mathcal{O}(d^2)$ and limited capability of downstream adaptation. Inspired by Givens rotation, in this paper, we proposed quasi-Givens Orthogonal Fine-Tuning (qGOFT) to address the problems. We first use $\mathcal{O}(d)$ Givens rotations to accomplish arbitrary orthogonal transformation in $SO(d)$ with provable equivalence, reducing parameter complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d)$. Then we introduce flexible norm and relative angular adjustments under soft orthogonality regularization to enhance the adaptation capability of downstream semantic deviations. Extensive experiments on various tasks and pretrained models validate the effectiveness of our methods.

IROS Conference 2024 Conference Paper

V2I-Calib: A Novel Calibration Approach for Collaborative Vehicle and Infrastructure LiDAR Systems

Qianxin Qu
Yijin Xiong
Guipeng Zhang
Xin Wu
Xiaohan Gao
Xin Gao
Hanyu Li
Shichun Guo

Cooperative LiDAR systems integrating vehicles and road infrastructure, termed V2I calibration, exhibit substantial potential, yet their deployment encounters numerous challenges. A pivotal aspect of ensuring data accuracy and consistency across such systems involves the calibration of LiDAR units across heterogeneous vehicular and infrastructural endpoints. This necessitates the development of calibration methods that are both real-time and robust, particularly those that can ensure robust performance in urban canyon scenarios without relying on initial positioning values. Accordingly, this paper introduces a novel approach to V2I calibration, leveraging spatial association information among perceived objects. Central to this method is the innovative Overall Intersection over Union (oIoU) metric, which quantifies the correlation between targets identified by vehicle and infrastructure systems, thereby facilitating the real-time monitoring of calibration results. Our approach involves identifying common targets within the perception results of vehicle and infrastructure LiDAR systems through the construction of an affinity matrix. These common targets then form the basis for the calculation and optimization of extrinsic parameters. Comparative and ablation studies conducted using the DAIR-V2X dataset substantiate the superiority of our approach. For further insights and resources, our project repository is accessible at https://github.com/MassimoQu/v2i-calib.

JBHI Journal 2023 Journal Article

Cross-Parametric Generative Adversarial Network-Based Magnetic Resonance Image Feature Synthesis for Breast Lesion Classification

Ming Fan
Guangyao Huang
Junhong Lou
Xin Gao
Tieyong Zeng
Lihua Li

Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) contains information on tumor morphology and physiology for breast cancer diagnosis and treatment. However, this technology requires contrast agent injection with more acquisition time than other parametric images, such as T2-weighted imaging (T2WI). Current image synthesis methods attempt to map the image data from one domain to another, whereas it is challenging or even infeasible to map the images with one sequence into images with multiple sequences. Here, we propose a new approach of cross-parametric generative adversarial network (GAN)-based feature synthesis (CPGANFS) to generate discriminative DCE-MRI features from T2WI with applications in breast cancer diagnosis. The proposed approach decodes the T2W images into latent cross-parameter features to reconstruct the DCE-MRI and T2WI features by balancing the information shared between the two. A Wasserstein GAN with a gradient penalty is employed to differentiate the T2WI-generated features from ground-truth features extracted from DCE-MRI. The synthesized DCE-MRI feature-based model achieved significantly (p = 0. 036) higher prediction performance (AUC = 0. 866) in breast cancer diagnosis than that based on T2WI (AUC = 0. 815). Visualization of the model shows that our CPGANFS method enhances the predictive power by levitating attention to the lesion and the surrounding parenchyma areas, which is driven by the interparametric information learned from T2WI and DCE-MRI. Our proposed CPGANFS provides a framework for cross-parametric MR image feature generation from a single-sequence image guided by an information-rich, time-series image with kinetic information. Extensive experimental results demonstrate its effectiveness with high interpretability and improved performance in breast cancer diagnosis.

AAAI Conference 2023 Conference Paper

Learning towards Selective Data Augmentation for Dialogue Generation

Xiuying Chen
Mingzhe Li
Jiayi Zhang
Xiaoqiang Xia
Chen Wei
Jianwei Cui
Xin Gao
Xiangliang Zhang

As it is cumbersome and expensive to acquire a huge amount of data for training neural dialog models, data augmentation is proposed to effectively utilize existing training samples. However, current data augmentation techniques on the dialog generation task mostly augment all cases in the training dataset without considering the intrinsic attributes between different cases. We argue that not all cases are beneficial for augmentation task, and the cases suitable for augmentation should obey the following two attributes: (1) low-quality (the dialog model cannot generate a high-quality response for the case), (2) representative (the case should represent the property of the whole dataset). Herein, we explore this idea by proposing a Selective Data Augmentation framework (SDA) for the response generation task. SDA employs a dual adversarial network to select the lowest quality and most representative data points for augmentation in one stage. Extensive experiments conducted on two publicly available datasets, i.e., DailyDialog and OpenSubtitles, show that our framework can improve the response generation performance with respect to various metrics

PDF Details DOI

EAAI Journal 2023 Journal Article

Robust lane line segmentation based on group feature enhancement

Xin Gao
Hanlin Bai
Yijin Xiong
Zefeng Bao
Guoying Zhang

AAAI Conference 2023 Conference Paper

Towards Efficient and Domain-Agnostic Evasion Attack with High-Dimensional Categorical Inputs

Hongyan Bao
Yufei Han
Yujun Zhou
Xin Gao
Xiangliang Zhang

Our work targets at searching feasible adversarial perturbation to attack a classifier with high-dimensional categorical inputs in a domain-agnostic setting. This is intrinsically a NP-hard knapsack problem where the exploration space becomes explosively larger as the feature dimension increases. Without the help of domain knowledge, solving this problem via heuristic method, such as Branch-and-Bound, suffers from exponential complexity, yet can bring arbitrarily bad attack results. We address the challenge via the lens of multi-armed bandit based combinatorial search. Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming. Our objective is to achieve highly efficient and effective attack using an Orthogonal Matching Pursuit (OMP)-enhanced Upper Confidence Bound (UCB) exploration strategy. Our theoretical analysis bounding the regret gap of FEAT guarantees its practical attack performance. In empirical analysis, we compare FEAT with other state-of-the-art domain-agnostic attack methods over various real-world categorical data sets of different applications. Substantial experimental observations confirm the expected efficiency and attack effectiveness of FEAT applied in different application scenarios. Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.

PDF Details DOI

JBHI Journal 2022 Journal Article

A Framework for Deep Multitask Learning With Multiparametric Magnetic Resonance Imaging for the Joint Prediction of Histological Characteristics in Breast Cancer

Ming Fan
Chengcheng Yuan
Guangyao Huang
Maosheng Xu
Shiwei Wang
Xin Gao
Lihua Li

The clinical management and decision-making process related to breast cancer are based on multiple histological indicators. This study aims to jointly predict the Ki-67 expression level, luminal A subtype and histological grade molecular biomarkers using a new deep multitask learning method with multiparametric magnetic resonance imaging. A multitask learning network structure was proposed by introducing a common-task layer and task-specific layers to learn the high-level features that are common to all tasks and related to a specific task, respectively. A network pretrained with knowledge from the ImageNet dataset was used and fine-tuned with MRI data. Information from multiparametric MR images was fused using the strategy at the feature and decision levels. The area under the receiver operating characteristic curve (AUC) was used to measure model performance. For single-task learning using a single image series, the deep learning model generated AUCs of 0. 752, 0. 722, and 0. 596 for the Ki-67, luminal A and histological grade prediction tasks, respectively. The performance was improved by freezing the first 5 convolutional layers, using 20% shared layers and fusing multiparametric series at the feature level, which achieved AUCs of 0. 819, 0. 799 and 0. 747 for Ki-67, luminal A and histological grade prediction tasks, respectively. Our study showed advantages in jointly predicting correlated clinical biomarkers using a deep multitask learning framework with an appropriate number of fine-tuned convolutional layers by taking full advantage of common and complementary imaging features. Multiparametric image series-based multitask learning could be a promising approach for the multiple clinical indicator-based management of breast cancer.

EAAI Journal 2022 Journal Article

Detection of local and clustered outliers based on the density–distance decision graph

Kangsheng Li
Xin Gao
Xin Jia
Bing Xue
Shiyuan Fu
Zhiyu Liu
Xu Huang
Zijian Huang

JBHI Journal 2022 Journal Article

Human-Guided Functional Connectivity Network Estimation for Chronic Tinnitus Identification: A Modularity View

Wei-Kai Li
Yu-Chen Chen
Xiao-Wen Xu
Xiao Wang
Xin Gao

The functional connectivity network (FCN) has been used to achieve several remarkable advancements in the diagnosis of neuro-degenerative disorders. Therefore, it is imperative to accurately estimate biologically meaningful FCNs. Several efforts have been dedicated to this purpose by encoding biological priors. However, owing to the high complexity of the human brain, the estimation of an ’ideal' FCN remains an open problem. To the best of our knowledge, almost all existing studies lack the integration of domain expert knowledge, which limits their performance. In this study, we focused on incorporating domain expert knowledge into the FCN estimation from a modularity perspective. To achieve this, we presented a human-guided modular representation (MR) FCN estimation framework. Specifically, we designed an adversarial low-rank constraint to describe the module structure of FCNs under the guidance of domain expert knowledge (i. e. , a predefined participant index). The chronic tinnitus (TIN) identification task based on the estimated FCNs was conducted to examine the proposed MR methods. Remarkably, MR significantly outperformed the baseline and state-of-the-art(SOTA) methods, achieving an accuracy of 92. 11%. Moreover, post-hoc analysis revealed that the FCNs estimated by the proposed MR could highlight more biologically meaningful connections, which is beneficial for exploring the underlying mechanisms of TIN and diagnosing early TIN.

ICLR Conference 2022 Conference Paper

Learning Towards The Largest Margins

Xiong Zhou
Xianming Liu 0005
Deming Zhai
Junjun Jiang
Xin Gao
Xiangyang Ji

One of the main challenges for feature representation in deep learning-based classification is the design of appropriate loss functions that exhibit strong discriminative power. The classical softmax loss does not explicitly encourage discriminative learning of features. A popular direction of research is to incorporate margins in well-established losses in order to enforce extra intra-class compactness and inter-class separability, which, however, were developed through heuristic means, as opposed to rigorous mathematical principles. In this work, we attempt to address this limitation by formulating the principled optimization objective as learning towards the largest margins. Specifically, we firstly propose to employ the class margin as the measure of inter-class separability, and the sample margin as the measure of intra-class compactness. Accordingly, to encourage discriminative representation of features, the loss function should promote the largest possible margins for both classes and samples. Furthermore, we derive a generalized margin softmax loss to draw general conclusions for the existing margin-based losses. Not only does this principled framework offer new perspectives to understand and interpret existing margin-based losses, but it also provides new insights that can guide the design of new tools, including \textit{sample margin regularization} and \textit{largest margin softmax loss} for class balanced cases, and \textit{zero centroid regularization} for class imbalanced cases. Experimental results demonstrate the effectiveness of our strategy for multiple tasks including visual classification, imbalanced classification, person re-identification, and face verification.

ICML Conference 2022 Conference Paper

Prototype-Anchored Learning for Learning with Imperfect Annotations

Xiong Zhou
Xianming Liu 0005
Deming Zhai
Junjun Jiang
Xin Gao
Xiangyang Ji

The success of deep neural networks greatly relies on the availability of large amounts of high-quality annotated data, which however are difficult or expensive to obtain. The resulting labels may be class imbalanced, noisy or human biased. It is challenging to learn unbiased classification models from imperfectly annotated datasets, on which we usually suffer from overfitting or underfitting. In this work, we thoroughly investigate the popular softmax loss and margin-based loss, and offer a feasible approach to tighten the generalization error bound by maximizing the minimal sample margin. We further derive the optimality condition for this purpose, which indicates how the class prototypes should be anchored. Motivated by theoretical analysis, we propose a simple yet effective method, namely prototype-anchored learning (PAL), which can be easily incorporated into various learning-based classification schemes to handle imperfect annotation. We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.

NeurIPS Conference 2022 Conference Paper

Towards Improving Faithfulness in Abstractive Summarization

Xiuying Chen
Mingzhe Li
Xin Gao
Xiangliang Zhang

Despite the success achieved in neural abstractive summarization based on pre-trained language models, one unresolved issue is that the generated summaries are not always faithful to the input document. There are two possible causes of the unfaithfulness problem: (1) the summarization model fails to understand or capture the gist of the input text, and (2) the model over-relies on the language model to generate fluent but inadequate words. In this work, we propose a Faithfulness Enhanced Summarization model (FES), which is designed for addressing these two problems and improving faithfulness in abstractive summarization. For the first problem, we propose to use question-answering (QA) to examine whether the encoder fully grasps the input document and can answer the questions on the key information in the input. The QA attention on the proper input words can also be used to stipulate how the decoder should attend to the source. For the second problem, we introduce a max-margin loss defined on the difference between the language and the summarization model, aiming to prevent the overconfidence of the language model. Extensive experiments on two benchmark summarization datasets, CNN/DM and XSum, demonstrate that our model significantly outperforms strong baselines. The evaluation of factual consistency also shows that our model generates more faithful summaries than baselines.

EAAI Journal 2021 Journal Article

A multiclass classification using one-versus-all approach with the differential partition sampling ensemble

Xin Gao
Yang He
Mi Zhang
Xinping Diao
Xiao Jing
Bing Ren
Weijia Ji

ICML Conference 2021 Conference Paper

Asymmetric Loss Functions for Learning with Noisy Labels

Xiong Zhou
Xianming Liu 0005
Junjun Jiang
Xin Gao
Xiangyang Ji

Robust loss functions are essential for training deep neural networks with better generalization power in the presence of noisy labels. Symmetric loss functions are confirmed to be robust to label noise. However, the symmetric condition is overly restrictive. In this work, we propose a new class of loss functions, namely asymmetric loss functions, which are robust to learning from noisy labels for arbitrary noise type. Subsequently, we investigate general theoretical properties of asymmetric loss functions, including classification-calibration, excess risk bound, and noise-tolerance. Meanwhile, we introduce the asymmetry ratio to measure the asymmetry of a loss function, and the empirical results show that a higher ratio will provide better robustness. Moreover, we modify several common loss functions, and establish the necessary and sufficient conditions for them to be asymmetric. Experiments on benchmark datasets demonstrate that asymmetric loss functions can outperform state-of-the-art methods.

AAAI Conference 2021 Conference Paper

GRASP: Generic Framework for Health Status Representation Learning Based on Incorporating Knowledge from Similar Patients

Chaohe Zhang
Xin Gao
Liantao Ma
Yasha Wang
Jiangtao Wang
Wen Tang

Deep learning models have been applied to many healthcare tasks based on electronic medical records (EMR) data and shown substantial performance. Existing methods commonly embed the records of a single patient into a representation for medical tasks. Such methods learn inadequate representations and lead to inferior performance, especially when the patient’s data is sparse or low-quality. Aiming at the above problem, we propose GRASP, a generic framework for healthcare models. For a given patient, GRASP first finds patients in the dataset who have similar conditions and similar results (i. e. , the similar patients), and then enhances the representation learning and prognosis of the given patient by leveraging knowledge extracted from these similar patients. GRASP defines similarities with different meanings between patients for different clinical tasks, and finds similar patients with useful information accordingly, and then learns cohort representation to extract valuable knowledge contained in the similar patients. The cohort information is fused with the current patient’s representation to conduct final clinical tasks. Experimental evaluations on two real-world datasets show that GRASP can be seamlessly integrated into state-of-the-art models with consistent performance improvements. Besides, under the guidance of medical experts, we verified the findings extracted by GRASP, and the findings are consistent with the existing medical knowledge, indicating that GRASP can generate useful insights for relevant predictions.

AIIM Journal 2021 Journal Article

Multiple instance convolutional neural network with modality-based attention and contextual multi-instance learning pooling layer for effective differentiation between borderline and malignant epithelial ovarian tumors

Junming Jian
Wei Xia
Rui Zhang
Xingyu Zhao
Jiayi Zhang
Xiaodong Wu
Yong'ai Li
Jinwei Qiang

AAAI Conference 2020 Conference Paper

AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration

Liantao Ma
Junyi Gao
Yasha Wang
Chaohe Zhang
Jiangtao Wang
Wenjie Ruan
Wen Tang
Xin Gao

Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays a vital role in indicating the health status, but it has not been explicitly extracted by existing works. Second, key factors that strongly indicate the health risk are different among patients. It is still challenging to adaptively make use of the features for patients in diverse conditions. Third, using prediction models as the black box will limit the reliability in clinical practice. However, none of the existing works can provide satisfying interpretability and meanwhile achieve high prediction performance. In this work, we develop a general health status representation learning model, named AdaCare. It can capture the long and short-term variations of biomarkers as clinical features to depict the health status in multiple time scales. It also models the correlation between clinical features to enhance the ones which strongly indicate the health status and thus can maintain a state-of-the-art performance in terms of prediction accuracy while providing qualitative interpretability. We conduct a health risk prediction experiment on two real-world datasets. Experiment results indicate that AdaCare outperforms state-of-the-art approaches and provides effective interpretability, which is veriﬁable by clinical experts.

AAAI Conference 2020 Conference Paper

ConCare: Personalized Clinical Feature Embedding via Capturing the Healthcare Context

Liantao Ma
Chaohe Zhang
Yasha Wang
Wenjie Ruan
Jiangtao Wang
Wen Tang
Xinyu Ma
Xin Gao

Predicting the patient’s clinical outcome from the historical electronic medical records (EMR) is a fundamental research problem in medical informatics. Most deep learning-based solutions for EMR analysis concentrate on learning the clinical visit embedding and exploring the relations between visits. Although those works have shown superior performances in healthcare prediction, they fail to explore the personal characteristics during the clinical visits thoroughly. Moreover, existing works usually assume that the more recent record weights more in the prediction, but this assumption is not suitable for all conditions. In this paper, we propose ConCare to handle the irregular EMR data and extract feature interrelationship to perform individualized healthcare prediction. Our solution can embed the feature sequences separately by modeling the time-aware distribution. ConCare further improves the multi-head self-attention via the cross-head decorrelation, so that the inter-dependencies among dynamic features and static baseline information can be effectively captured to form the personal health context. Experimental results on two real-world EMR datasets demonstrate the effectiveness of ConCare. The medical ﬁndings extracted by ConCare are also empirically conﬁrmed by human experts and medical literature.

ICML Conference 2020 Conference Paper

Distance Metric Learning with Joint Representation Diversification

Xu Chu
Yang Lin
Yasha Wang
Xiting Wang
Hailong Yu
Xin Gao
Qi Tong

Distance metric learning (DML) is to learn a representation space equipped with a metric, such that similar examples are closer than dissimilar examples concerning the metric. The recent success of DNNs motivates many DML losses that encourage the intra-class compactness and inter-class separability. The trade-off between inter-class compactness and inter-class separability shapes the DML representation space by determining how much information of the original inputs to retain. In this paper, we propose a Distance Metric Learning with Joint Representation Diversification (JRD) that allows a better balancing point between intra-class compactness and inter-class separability. Specifically, we propose a Joint Representation Similarity regularizer that captures different abstract levels of invariant features and diversifies the joint distributions of representations across multiple layers. Experiments on three deep DML benchmark datasets demonstrate the effectiveness of the proposed approach.

JBHI Journal 2020 Journal Article

Joint Prediction of Breast Cancer Histological Grade and Ki-67 Expression Level Based on DCE-MRI and DWI Radiomics

Ming Fan
Wei Yuan
Wenrui Zhao
Maosheng Xu
Shiwei Wang
Xin Gao
Lihua Li

Objective: Histologic grade and Ki-67 proliferation status are important clinical indictors for breast cancer prognosis and treatment. The purpose of this study is to improve prediction accuracy of these clinical indicators based on tumor radiomic analysis. Methods: We jointly predicted Ki-67 and tumor grade with a multitask learning framework by separately utilizing radiomics from tumor MRI series. Additionally, we showed how multitask learning models (MTLs) could be extended to combined radiomics from the MRI series for a better prediction based on the assumption that features from different sources of images share common patterns while providing complementary information. Tumor radiomic analysis was performed with morphological, statistical and textural features extracted on the DWI and dynamic contrast-enhanced MRI (DCE-MRI) series of the precontrast and subtraction images, respectively. Results: Joint prediction of Ki-67 status and tumor grade on MR images using the MTL achieved performance improvements over that of single-task-based predictive models. Similarly, for the prediction tasks of Ki-67 and tumor grade, the MTL for combined precontrast and apparent diffusion coefficient (ADC) images achieved AUCs of 0. 811 and 0. 816, which were significantly better than that of the single-task- based model with p values of 0. 005 and 0. 017, respectively. Conclusion: Mapping MRI radiomics to two related clinical indicators improves prediction performance for both Ki-67 expression level and tumor grade. Significance: Joint prediction of indicators by multitask learning that combines correlations of MRI radiomics is important for optimal tumor therapy and treatment because clinical decisions are made by integrating multiple clinical indicators.

NeurIPS Conference 2020 Conference Paper

One-sample Guided Object Representation Disassembling

Zunlei Feng
Yongming He
Xinchao Wang
Xin Gao
Jie Lei
Cheng Jin
Mingli Song

The ability to disassemble the features of objects and background is crucial for many machine learning tasks, including image classification, image editing, visual concepts learning, and so on. However, existing (semi-)supervised methods all need a large amount of annotated samples, while unsupervised methods can't handle real-world images with complicated backgrounds. In this paper, we introduce the One-sample Guided Object Representation Disassembling (One-GORD) method, which only requires one annotated sample for each object category to learn disassembled object representation from unannotated images. For the annotated one-sample, we first adopt some data augmentation strategies to generate some synthetic samples, which can guide the disassembling of the object features and background features. For the unannotated images, two self-supervised mechanisms: dual-swapping and fuzzy classification are introduced to disassemble object features from the background with the guidance of annotated one-sample. What's more, we devise two metrics to evaluate the disassembling performance from the perspective of representation and image, respectively. Experiments demonstrate that the One-GORD achieves competitive dissembling performance and can handle natural scenes with complicated backgrounds.

IJCAI Conference 2019 Conference Paper

An Online Intelligent Visual Interaction System

Anxiang Zeng
Han Yu
Xin Gao
Kairi Ou
Zhenchuan Huang
Peng Hou
Mingli Song
Jingshu Zhang

This paper proposes an Online Intelligent Visual Interactive System (OIVIS), which can be applied to various live video broadcast and short video scenes to provide an interactive user experience. In the live video broadcast, the anchor can issue various commands by using pre-defined gestures, and can trigger real-time background replacement to create an immersive atmosphere. To support such dynamic interactivity, we implemented algorithms including real-time gesture recognition and real-time video portrait segmentation, developed a deep network inference framework, and a real-time rendering framework AI Gender at the front end to create a complete set of visual interaction solutions for use in resource constrained mobile.

AAAI Conference 2019 Conference Paper

Approximate Kernel Selection with Strong Approximate Consistency

Lizhong Ding
Yong Liu
Shizhong Liao
Yu Li
Peng Yang
Yijie Pan
Chao Huang
Ling Shao

Kernel selection is fundamental to the generalization performance of kernel-based learning algorithms. Approximate kernel selection is an efficient kernel selection approach that exploits the convergence property of the kernel selection criteria and the computational virtue of kernel matrix approximation. The convergence property is measured by the notion of approximate consistency. For the existing Nyström approximations, whose sampling distributions are independent of the specific learning task at hand, it is difficult to establish the strong approximate consistency. They mainly focus on the quality of the low-rank matrix approximation, rather than the performance of the kernel selection criterion used in conjunction with the approximate matrix. In this paper, we propose a novel Nyström approximate kernel selection algorithm by customizing a criterion-driven adaptive sampling distribution for the Nyström approximation, which adaptively reduces the error between the approximate and accurate criteria. We theoretically derive the strong approximate consistency of the proposed Nyström approximate kernel selection algorithm. Finally, we empirically evaluate the approximate consistency of our algorithm as compared to state-of-the-art methods.

AAAI Conference 2019 Conference Paper

Confidence Weighted Multitask Learning

Peng Yang
Peilin Zhao
Jiayu Zhou
Xin Gao

Traditional online multitask learning only utilizes the firstorder information of the datastream. To remedy this issue, we propose a confidence weighted multitask learning algorithm, which maintains a Gaussian distribution over each task model to guide online learning process. The mean (covariance) of the Gaussian Distribution is a sum of a local component and a global component that is shared among all the tasks. In addition, this paper also addresses the challenge of active learning on the online multitask setting. Instead of requiring labels of all the instances, the proposed algorithm determines whether the learner should acquire a label by considering the confidence from its related tasks over label prediction. Theoretical results show the regret bounds can be significantly reduced. Empirical results demonstrate that the proposed algorithm is able to achieve promising learning efficacy, while simultaneously minimizing the labeling cost.

AAAI Conference 2019 Conference Paper

Linear Kernel Tests via Empirical Likelihood for High-Dimensional Data

Lizhong Ding
Zhi Liu
Yu Li
Shizhong Liao
Yong Liu
Peng Yang
Ge Yu
Ling Shao

We propose a framework for analyzing and comparing distributions without imposing any parametric assumptions via empirical likelihood methods. Our framework is used to study two fundamental statistical test problems: the two-sample test and the goodness-of-fit test. For the two-sample test, we need to determine whether two groups of samples are from different distributions; for the goodness-of-fit test, we examine how likely it is that a set of samples is generated from a known target distribution. Specifically, we propose empirical likelihood ratio (ELR) statistics for the two-sample test and the goodness-of-fit test, both of which are of linear time complexity and show higher power (i. e. , the probability of correctly rejecting the null hypothesis) than the existing linear statistics for high-dimensional data. We prove the nonparametric Wilks’ theorems for the ELR statistics, which illustrate that the limiting distributions of the proposed ELR statistics are chi-square distributions. With these limiting distributions, we can avoid bootstraps or simulations to determine the threshold for rejecting the null hypothesis, which makes the ELR statistics more efficient than the recently proposed linear statistic, finite set Stein discrepancy (FSSD). We also prove the consistency of the ELR statistics, which guarantees that the test power goes to 1 as the number of samples goes to infinity. In addition, we experimentally demonstrate and theoretically analyze that FSSD has poor performance or even fails to test for high-dimensional data. Finally, we conduct a series of experiments to evaluate the performance of our ELR statistics as compared to state-of-the-art linear statistics.

IJCAI Conference 2018 Conference Paper

Bandit Online Learning on Graphs via Adaptive Optimization

Peng Yang
Peilin Zhao
Xin Gao

Traditional online learning on graphs adapts graph Laplacian into ridge regression, which may not guarantee reasonable accuracy when the data are adversarially generated. To solve this issue, we exploit an adaptive optimization framework for online classification on graphs. The derived model can achieve a min-max regret under an adversarial mechanism of data generation. To take advantage of the informative labels, we propose an adaptive large-margin update rule, which enjoys a lower regret than the algorithms using error-driven update rules. However, this algorithm assumes that the full information label is provided for each node, which is violated in many practical applications where labeling is expensive and the oracle may only tell whether the prediction is correct or not. To address this issue, we propose a bandit online algorithm on graphs. It derives per-instance confidence region of the prediction, from which the model can be learned adaptively to minimize the online regret. Experiments on benchmark graph datasets show that the proposed bandit algorithm outperforms state-of-the-art competitors, even sometimes beats the algorithms using full information label feedback.

IJCAI Conference 2018 Conference Paper

Convergence Analysis of Gradient Descent for Eigenvector Computation

Zhiqiang Xu
Xin Cao
Xin Gao

We present a novel, simple and systematic convergence analysis of gradient descent for eigenvector computation. As a popular, practical, and provable approach to numerous machine learning problems, gradient descent has found successful applications to eigenvector computation as well. However, surprisingly, it lacks a thorough theoretical analysis for the underlying geodesically non-convex problem. In this work, the convergence of the gradient descent solver for the leading eigenvector computation is shown to be at a global rate O(min{ (lambda_1/Delta_p)^2 log(1/epsilon), 1/epsilon }), where Delta_p=lambda_p-lambda_p+1>0 represents the generalized positive eigengap and always exists without loss of generality with lambda_i being the i-th largest eigenvalue of the given real symmetric matrix and p being the multiplicity of lambda_1. The rate is linear at (lambda_1/Delta_p)^2 log(1/epsilon) if (lambda_1/Delta_p)^2=O(1), otherwise sub-linear at O(1/epsilon). We also show that the convergence only logarithmically instead of quadratically depends on the initial iterate. Particularly, this is the first time the linear convergence for the case that the conventionally considered eigengap Delta_1= lambda_1 - lambda_2=0 but the generalized eigengap Delta_p satisfies (lambda_1/Delta_p)^2=O(1), as well as the logarithmic dependence on the initial iterate are established for the gradient descent solver. We are also the first to leverage for analysis the log principal angle between the iterate and the space of globally optimal solutions. Theoretical properties are verified in experiments.

AAAI Conference 2018 Conference Paper

Randomized Kernel Selection With Spectra of Multilevel Circulant Matrices

Lizhong Ding
Shizhong Liao
Yong Liu
Peng Yang
Xin Gao

Kernel selection aims at choosing an appropriate kernel function for kernel-based learning algorithms to avoid either underﬁtting or overﬁtting of the resulting hypothesis. One of the main problems faced by kernel selection is the evaluation of the goodness of a kernel, which is typically difﬁcult and computationally expensive. In this paper, we propose a randomized kernel selection approach to evaluate and select the kernel with the spectra of the speciﬁcally designed multilevel circulant matrices (MCMs), which is statistically sound and computationally efﬁcient. Instead of constructing the kernel matrix, we construct the randomized MCM to encode the kernel function and all data points together with labels. We build a one-to-one correspondence between all candidate kernel functions and the spectra of the randomized MCMs by Fourier transform. We prove the statistical properties of the randomized MCMs and the randomized kernel selection criteria, which theoretically qualify the utility of the randomized criteria in kernel selection. With the spectra of the randomized MCMs, we derive a series of randomized criteria to conduct kernel selection, which can be computed in log-linear time and linear space complexity by fast Fourier transform (FFT). Experimental results demonstrate that our randomized kernel selection criteria are signiﬁcantly more efﬁcient than the existing classic and widely-used criteria while preserving similar predictive performance.

AAAI Conference 2017 Conference Paper

FeaBoost: Joint Feature and Label Refinement for Semantic Segmentation

Yulei Niu
Zhiwu Lu
Songfang Huang
Xin Gao
Ji-Rong Wen

We propose a novel approach, called FeaBoost, to image semantic segmentation with only image-level labels taken as weakly-supervised constraints. Our approach is motivated from two evidences: 1) each superpixel can be represented as a linear combination of basic components (e. g. , predeﬁned classes); 2) visually similar superpixels have high probability to share the same set of labels, i. e. , they tend to have common combination of predeﬁned classes. By taking these two evidences into consideration, semantic segmentation is formulated as joint feature and label reﬁnement over superpixels. Furthermore, we develop an efﬁcient FeaBoost algorithm to solve such optimization problem. Extensive experiments on the MSRC and LabelMe datasets demonstrate the superior performance of our FeaBoost approach in comparison with the state-of-the-art methods, especially when noisy labels are provided for semantic segmentation.

AAAI Conference 2016 Conference Paper

Optimizing Multivariate Performance Measures from Multi-View Data

Jim Jing-Yan Wang
Ivor Tsang
Xin Gao

To date, many machine learning applications have multiple views of features, and different applications require speciﬁc multivariate performance measures, such as the F-score for retrieval. However, existing multivariate performance measure optimization methods are limited to single-view data, while traditional multi-view learning methods cannot optimize multivariate performance measures directly. To ﬁll this gap, in this paper, we propose the problem of optimizing multivariate performance measures from multi-view data, and an effective method to solve it. We propose to learn linear discriminant functions for different views, and combine them to construct an overall multivariate mapping function for multiview data. To learn the parameters of the linear discriminant functions of different views to optimize a given multivariate performance measure, we formulate an optimization problem. In this problem, we propose to minimize the complexity of the linear discriminant function of each view, promote the consistency of the responses of different views over the same data points, and minimize the upper boundary of the corresponding loss of a given multivariate performance measure. To optimize this problem, we develop an iterative cuttingplane algorithm. Experiments on four benchmark data sets show that it not only outperforms traditional single-view based multivariate performance optimization methods, but also achieves better results than ordinary multi-view learning methods.

AAAI Conference 2015 Conference Paper

Efficient Active Learning of Halfspaces via Query Synthesis

Ibrahim Alabdulmohsin
Xin Gao
Xiangliang Zhang

Active learning is a subfield of machine learning that has been successfully used in many applications including text classification and bioinformatics. One of the fundamental branches of active learning is query synthesis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the true decision boundary. Nevertheless, the existing literature on membership query synthesis has focused on finite concept classes with a limited extension to real-world applications. In this paper, we present an efficient spectral algorithm for membership query synthesis for halfspaces, whose sample complexity is experimentally shown to be near-optimal. At each iteration, the algorithm consists of two steps. First, a convex optimization problem is solved that provides an approximate characterization of the version space. Second, a principal component is extracted, which yields a synthetic query that shrinks the version space exponentially fast. Unlike traditional methods in active learning, the proposed method can be readily extended into the batch setting by solving for the top k eigenvectors in the second step. Experimentally, it exhibits a significant improvement over traditional approaches such as uncertainty sampling and representative sampling. For example, to learn a halfspace in the Euclidean plane with 25 dimensions and an estimation error of 1E-4, the proposed algorithm uses less than 3% of the number of queries required by uncertainty sampling.

EAAI Journal 2015 Journal Article

Maximum mutual information regularized classification

Jim Jing-Yan Wang
Yi Wang
Shiguang Zhao
Xin Gao

AAAI Conference 2015 Conference Paper

Noise-Robust Semi-Supervised Learning by Large-Scale Sparse Coding

Zhiwu Lu
Xin Gao
Liwei Wang
Ji-Rong Wen
Songfang Huang

This paper presents a large-scale sparse coding algorithm to deal with the challenging problem of noiserobust semi-supervised learning over very large data with only few noisy initial labels. By giving an L1-norm formulation of Laplacian regularization directly based upon the manifold structure of the data, we transform noise-robust semi-supervised learning into a generalized sparse coding problem so that noise reduction can be imposed upon the noisy initial labels. Furthermore, to keep the scalability of noise-robust semi-supervised learning over very large data, we make use of both nonlinear approximation and dimension reduction techniques to solve this generalized sparse coding problem in linear time and space complexity. Finally, we evaluate the proposed algorithm in the challenging task of large-scale semi-supervised image classification with only few noisy initial labels. The experimental results on several benchmark image datasets show the promising performance of the proposed algorithm.

IJCAI Conference 2015 Conference Paper

Social Image Parsing by Cross-Modal Data Refinement

Zhiwu Lu
Xin Gao
Songfang Huang
Liwei Wang
Ji-Rong Wen

This paper presents a cross-modal data refinement algorithm for social image parsing, or segmenting all the objects within a social image and then identifying their categories. Different from the traditional fully supervised image parsing that takes pixel-level labels as strong supervisory information, our social image parsing is initially provided with the noisy tags of images (i. e. image-level labels), which are shared by social users. By oversegmenting each image into multiple regions, we formulate social image parsing as a cross-modal data refinement problem over a large set of regions, where the initial labels of each region are inferred from image-level labels. Furthermore, we develop an efficient algorithm to solve such cross-modal data refinement problem. The experimental results on several benchmark datasets show the effectiveness of our algorithm. More notably, our algorithm can be considered to provide an alternative and natural way to address the challenging problem of image parsing, since image-level labels are much easier to access than pixel-level labels.

EAAI Journal 2014 Journal Article

Beyond cross-domain learning: Multiple-domain nonnegative matrix factorization

Jim Jing-Yan Wang
Xin Gao

AAAI Conference 2014 Conference Paper

Supervised Transfer Sparse Coding

Maruan Al-Shedivat
Jim Jing-Yan Wang
Majed Alzahrani
Jianhua Huang
Xin Gao

A combination of the sparse coding and transfer learning techniques was shown to be accurate and robust in classification tasks where training and testing objects have a shared feature space but are sampled from different underlying distributions, i. e. , belong to different domains. The key assumption in such case is that in spite of the domain disparity, samples from different domains share some common hidden factors. Previous methods often assumed that all the objects in the target domain are unlabeled, and thus the training set solely comprised objects from the source domain. However, in real world applications, the target domain often has some labeled objects, or one can always manually label a small number of them. In this paper, we explore such possibility and show how a small number of labeled data in the target domain can significantly leverage classification accuracy of the state-of-the-art transfer sparse coding methods. We further propose a unified framework named supervised transfer sparse coding (STSC) which simultaneously optimizes sparse representation, domain transfer and classification. Experimental results on three applications demonstrate that a little manual labeling and then learning the model in a supervised fashion can significantly improve classification accuracy.