Author name cluster

Xiaohui Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

AAAI Conference 2026 Conference Paper

GCA: Geometry-aware Conditional Alignment for Partial Domain Adaptation with Coding Rate Reduction

Xiaohui Chen
Chuan-Xian Ren

Partial Domain Adaptation (PDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain, where the target label space is a subset of the source label space. In PDA scenario, existing methods typically achieve transferability through distribution alignment in a statistical framework, and discriminability through geometric modeling. These two aspects are often treated as separate frameworks, which severs the intrinsic connection between them. To bridge this gap, we propose a unified framework termed Geometry-aware Conditional Alignment (GCA), which is derived from theoretical insights of Maximum Coding Rate Reduction. GCA collaboratively achieves conditional alignment and orthogonal discriminability in a unified framework, making the learned features more interpretable in both statistical and geometric aspects. As a result, GCA effectively enhances both the transferability and discriminability of features. Extensive experiments on four benchmark datasets validate the effectiveness of GCA.

PDF Details DOI

JBHI Journal 2026 Journal Article

TinnitusLLM: A Multimodal Large Language Model Framework for Tinnitus Diagnosis Through EEG-fMRI Fusion Learning

Yipeng Du
Xiaohui Chen
Zewei Liu
Zhengwu Liu
Ngai Wong
Chi Zhang
Jian Chen
Zhiwei Ding

Accurate tinnitus diagnosis is crucial for enabling timely therapeutic intervention and longitudinal treatment monitoring. While non-invasive neuroimaging modalities-particularly electroencephalography (EEG) with millisecond temporal resolution and functional magnetic resonance imaging (fMRI) with millimeter spatial resolution- provide complementary neural features, existing diagnostic approaches remain constrained to unimodal analysis of EEG or fMRI data, inherently limiting diagnostic precision and clinical generalizability. This paper introduces TinnitusLLM, the first multimodal large language model (LLM) framework that synergistically integrates EEG and fMRI features for tinnitus diagnosis. To enable LLM-based interpretation of neural signals, this framework integrates three key components: (1) a neuroinspired positional encoding mechanism that injects neurophysiological priors into the embedding space, enabling neurologically grounded, dynamic positional mapping of EEG and fMRI tokens; (2) multimodal autoregressive pretraining on more than 500 hours of EEG and 250 hours of fMRI data to learn causally informed predictive representations; and (3) fine-tuning with a cross-modal, subject-invariant adversarial learning strategy that enforces subject-independent constraints in the shared cross-modal feature space, thereby substantially improving diagnostic robustness across subjects. We validate TinnitusLLM through comprehensive experiments on a rigorously collected multimodal dataset containing 20 participants. Quantitative evaluations demonstrate that TinnitusLLM achieves superior cross-subject diagnostic accuracy compared to the state-of-the-art baseline methods. These results underscore TinnitusLLM's potential as a clinically viable framework for objective tinnitus assessment through multimodal neural decoding.

Details DOI

NeurIPS Conference 2025 Conference Paper

A Generalized Label Shift Perspective for Cross-Domain Gaze Estimation

Hao-Ran Yang
Xiaohui Chen
Chuan-Xian Ren

Aiming to generalize the well-trained gaze estimation model to new target domains, Cross-domain Gaze Estimation (CDGE) is developed for real-world application scenarios. Existing CDGE methods typically extract the domain-invariant features to mitigate domain shift in feature space, which is proved insufficient by Generalized Label Shift (GLS) theory. In this paper, we introduce a novel GLS perspective to CDGE and modelize the cross-domain problem by label and conditional shift problem. A GLS correction framework is presented and a feasible realization is proposed, in which a importance reweighting strategy based on truncated Gaussian distribution is introduced to overcome the continuity challenges in label shift correction. To embed the reweighted source distribution to conditional invariant learning, we further derive a probability-aware estimation of conditional operator discrepancy. Extensive experiments on standard CDGE tasks with different backbone models validate the superior generalization capability across domain and applicability on various models of proposed method.

PDF Details

ICML Conference 2025 Conference Paper

Graph Generative Pre-trained Transformer

Xiaohui Chen
Yinkai Wang
Jiaxing He
Yuanqi Du
Soha Hassoun
Xiaolin Xu
Liping Liu 0001

Graph generation is a critical task in numerous domains, including molecular design and social network analysis, due to its ability to model complex relationships and structured data. While most modern graph generative models utilize adjacency matrix representations, this work revisits an alternative approach that represents graphs as sequences of node set and edge set. We advocate for this approach due to its efficient encoding of graphs and propose a novel representation. Based on this representation, we introduce the Graph Generative Pre-trained Transformer (G2PT), an auto-regressive model that learns graph structures via next-token prediction. To further exploit G2PT’s capabilities as a general-purpose foundation model, we explore fine-tuning strategies for two downstream applications: goal-oriented generation and graph property prediction. We conduct extensive experiments across multiple datasets. Results indicate that G2PT achieves superior generative performance on both generic graph and molecule datasets. Furthermore, G2PT exhibits strong adaptability and versatility in downstream tasks from molecular design to property prediction.

Details

ICLR Conference 2025 Conference Paper

MADGEN: Mass-Spec attends to De Novo Molecular generation

Yinkai Wang
Xiaohui Chen
Liping Liu
Soha Hassoun

The annotation (assigning structural chemical identities) of MS/MS spectra remains a significant challenge due to the enormous molecular diversity in biological samples and the limited scope of reference databases. Currently, the vast majority of spectral measurements remain in the "dark chemical space" without structural annotations. To improve annotation, we propose MADGEN (Mass-spec Attends to De Novo Molecular GENeration), a scaffold-based method for de novo molecular structure generation guided by mass spectrometry data. MADGEN operates in two stages: scaffold retrieval and spectra-conditioned molecular generation starting with the scaffold. In the first stage, given an MS/MS spectrum, we formulate scaffold retrieval as a ranking problem and employ contrastive learning to align mass spectra with candidate molecular scaffolds. In the second stage, starting from the retrieved scaffold, we employ the MS/MS spectrum to guide an attention-based generative model to generate the final molecule. Our approach constrains the molecular generation search space, reducing its complexity and improving generation accuracy. We evaluate MADGEN on three datasets (NIST23, CANOPUS, and MassSpecGym) and evaluate MADGEN's performance with a predictive scaffold retriever and with an oracle retriever. We demonstrate the effectiveness of using attention to integrate spectral information throughout the generation process to achieve strong results with the oracle retriever.

Details

TMLR Journal 2025 Journal Article

Multi-Modal Foundation Models for Computational Pathology: A Survey

Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Zhong Chen
Peter K Sorger

Foundation models have emerged as a powerful paradigm in computational pathology (CPath), enabling scalable and generalizable analysis of histopathological images. While early developments centered on uni-modal models trained solely on visual data, recent advances have highlighted the promise of multi-modal foundation models that integrate heterogeneous data sources such as textual reports, structured domain knowledge, and molecular profiles. In this survey, we provide a comprehensive and up-to-date review of multi-modal foundation models in CPath, with a particular focus on models built upon hematoxylin and eosin (H&E) stained whole slide images (WSIs) and tile-level representations. We categorize 34 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression. We further divide vision-language models into non-LLM-based and LLM-based approaches. Additionally, we analyze 30 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs. Our survey also presents a taxonomy of downstream tasks, highlights training and evaluation strategies, and identifies key challenges and future directions. We aim for this survey to serve as a valuable resource for researchers and practitioners working at the intersection of pathology and AI.

PDF Details

ICML Conference 2025 Conference Paper

Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization

Kaheon Kim
Rentian Yao
Changbo Zhu
Xiaohui Chen

The optimal transport barycenter (a. k. a. Wasserstein barycenter) is a fundamental notion of averaging that extends from the Euclidean space to the Wasserstein space of probability distributions. Computation of the unregularized barycenter for discretized probability distributions on point clouds is a challenging task when the domain dimension $d > 1$. Most practical algorithms for approximating the barycenter problem are based on entropic regularization. In this paper, we introduce a nearly linear time $O(m \log{m})$ and linear space complexity $O(m)$ primal-dual algorithm, the Wasserstein-Descent $\dot{\mathbb{H}}^1$-Ascent (WDHA) algorithm, for computing the exact barycenter when the input probability density functions are discretized on an $m$-point grid. The key success of the WDHA algorithm hinges on alternating between two different yet closely related Wasserstein and Sobolev optimization geometries for the primal barycenter and dual Kantorovich potential subproblems. Under reasonable assumptions, we establish the convergence rate and iteration complexity of WDHA to its stationary point when the step size is appropriately chosen. Superior computational efficacy, scalability, and accuracy over the existing Sinkhorn-type algorithms are demonstrated on high-resolution (e. g. , $1024 \times 1024$ images) 2D synthetic and real data.

Details

UAI Conference 2024 Conference Paper

GeONet: a neural operator for learning the Wasserstein geodesic

Andrew Gracyk
Xiaohui Chen

Optimal transport (OT) offers a versatile framework to compare complex data distributions in a geometrically meaningful way. Traditional methods for computing the Wasserstein distance and geodesic between probability measures require mesh-specific domain discretization and suffer from the curse-of-dimensionality. We present GeONet, a mesh-invariant deep neural operator network that learns the non-linear mapping from the input pair of initial and terminal distributions to the Wasserstein geodesic connecting the two endpoint distributions. In the offline training stage, GeONet learns the saddle point optimality conditions for the dynamic formulation of the OT problem in the primal and dual spaces that are characterized by a coupled PDE system. The subsequent inference stage is instantaneous and can be deployed for real-time predictions in the online learning setting. We demonstrate that GeONet achieves comparable testing accuracy to the standard OT solvers on simulation examples and the MNIST dataset with considerably reduced inference-stage computational cost by orders of magnitude.

Details

ICLR Conference 2024 Conference Paper

Statistically Optimal K-means Clustering via Nonnegative Low-rank Semidefinite Programming

Yubo Zhuang
Xiaohui Chen
Yun Yang
Richard Y. Zhang 0001

$K$-means clustering is a widely used machine learning method for identifying patterns in large datasets. Recently, semidefinite programming (SDP) relaxations have been proposed for solving the $K$-means optimization problem, which enjoy strong statistical optimality guarantees. However, the prohibitive cost of implementing an SDP solver renders these guarantees inaccessible to practical datasets. In contrast, nonnegative matrix factorization (NMF) is a simple clustering algorithm widely used by machine learning practitioners, but it lacks a solid statistical underpinning and theoretical guarantees. In this paper, we consider an NMF-like algorithm that solves a nonnegative low-rank restriction of the SDP-relaxed $K$-means formulation using a nonconvex Burer--Monteiro factorization approach. The resulting algorithm is as simple and scalable as state-of-the-art NMF algorithms while also enjoying the same strong statistical optimality guarantees as the SDP. In our experiments, we observe that our algorithm achieves significantly smaller mis-clustering errors compared to the existing state-of-the-art while maintaining scalability.

Details

JMLR Journal 2024 Journal Article

Wasserstein Proximal Coordinate Gradient Algorithms

Rentian Yao
Xiaohui Chen
Yun Yang

Motivated by approximation Bayesian computation using mean-field variational approximation and the computation of equilibrium in multi-species systems with cross-interaction, this paper investigates the composite geodesically convex optimization problem over multiple distributions. The objective functional under consideration is composed of a convex potential energy on a product of Wasserstein spaces and a sum of convex self-interaction and internal energies associated with each distribution. To efficiently solve this problem, we introduce the Wasserstein Proximal Coordinate Gradient (WPCG) algorithms with parallel, sequential, and random update schemes. Under a quadratic growth (QG) condition that is weaker than the usual strong convexity requirement on the objective functional, we show that WPCG converges exponentially fast to the unique global optimum. In the absence of the QG condition, WPCG is still demonstrated to converge to the global optimal solution, albeit at a slower polynomial rate. Numerical results for both motivating examples are consistent with our theoretical findings. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

PDF Details

ICML Conference 2023 Conference Paper

Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling

Xiaohui Chen
Jiaxing He
Xu Han 0012
Liping Liu 0001

Diffusion-based generative graph models have been proven effective in generating high-quality small graphs. However, they need to be more scalable for generating large graphs containing thousands of nodes desiring graph statistics. In this work, we propose EDGE, a new diffusion-based generative graph model that addresses generative tasks with large graphs. To improve computation efficiency, we encourage graph sparsity by using a discrete diffusion process that randomly removes edges at each time step and finally obtains an empty graph. EDGE only focuses on a portion of nodes in the graph at each denoising step. It makes much fewer edge predictions than previous diffusion-based models. Moreover, EDGE admits explicitly modeling the node degrees of the graphs, further improving the model performance. The empirical study shows that EDGE is much more efficient than competing methods and can generate large graphs with thousands of nodes. It also outperforms baseline models in generation quality: graphs generated by our approach have more similar graph statistics to those of the training graphs.

Details

JMLR Journal 2023 Journal Article

Fitting Autoregressive Graph Generative Models through Maximum Likelihood Estimation

Xu Han
Xiaohui Chen
Francisco J. R. Ruiz
Li-Ping Liu

We consider the problem of fitting autoregressive graph generative models via maximum likelihood estimation (MLE). MLE is intractable for graph autoregressive models because the nodes in a graph can be arbitrarily reordered; thus the exact likelihood involves a sum over all possible node orders leading to the same graph. In this work, we fit the graph models by maximizing a variational bound, which is built by first deriving the joint probability over the graph and the node order of the autoregressive process. This approach avoids the need to specify ad-hoc node orders, since an inference network learns the most likely node sequences that have generated a given graph. We improve the approach by developing a graph generative model based on attention mechanisms and an inference network based on routing search. We demonstrate empirically that fitting autoregressive graph models via variational inference improves their qualitative and quantitative performance, and the improved model and inference network further boost the performance. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

PDF Details

ICML Conference 2023 Conference Paper

Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data

Yubo Zhuang
Xiaohui Chen
Yun Yang

Clustering is a widely deployed unsupervised learning tool. Model-based clustering is a flexible framework to tackle data heterogeneity when the clusters have different shapes. Likelihood-based inference for mixture distributions often involves non-convex and high-dimensional objective functions, imposing difficult computational and statistical challenges. The classic expectation-maximization (EM) algorithm is a computationally thrifty iterative method that maximizes a surrogate function minorizing the log-likelihood of observed data in each iteration, which however suffers from bad local maxima even in the special case of the standard Gaussian mixture model with common isotropic covariance matrices. On the other hand, recent studies reveal that the unique global solution of a semidefinite programming (SDP) relaxed $K$-means achieves the information-theoretically sharp threshold for perfectly recovering the cluster labels under the standard Gaussian mixture model. In this paper, we extend the SDP approach to a general setting by integrating cluster labels as model parameters and propose an iterative likelihood adjusted SDP (iLA-SDP) method that directly maximizes the exact observed likelihood in the presence of data heterogeneity. By lifting the cluster assignment to group-specific membership matrices, iLA-SDP avoids centroids estimation – a key feature that allows exact recovery under well-separateness of centroids without being trapped by their adversarial configurations. Thus iLA-SDP is less sensitive than EM to initialization and more stable on high-dimensional data. Our numeric experiments demonstrate that iLA-SDP can achieve lower mis-clustering errors over several widely used clustering methods including $K$-means, SDP and EM algorithms.

Details

NeurIPS Conference 2023 Conference Paper

On Separate Normalization in Self-supervised Transformers

Xiaohui Chen
Yinkai Wang
Yuanqi Du
Soha Hassoun
Liping Liu

Self-supervised training methods for transformers have demonstrated remarkable performance across various domains. Previous transformer-based models, such as masked autoencoders (MAE), typically utilize a single normalization layer for both the [CLS] symbol and the tokens. We propose in this paper a simple modification that employs separate normalization layers for the tokens and the [CLS] symbol to better capture their distinct characteristics and enhance downstream task performance. Our method aims to alleviate the potential negative effects of using the same normalization statistics for both token types, which may not be optimally aligned with their individual roles. We empirically show that by utilizing a separate normalization layer, the [CLS] embeddings can better encode the global contextual information and are distributed more uniformly in its anisotropic space. When replacing the conventional normalization layer with the two separate layers, we observe an average 2. 7% performance improvement over the image, natural language, and graph domains.

PDF Details

TMLR Journal 2022 Journal Article

Interpretable Node Representation with Attribute Decoding

Xiaohui Chen
Xi Chen
Liping Liu

Variational Graph Autoencoders (VGAEs) are powerful models for unsupervised learning of node representations from graph data. In this work, we make a systematic analysis of modeling node attributes in VGAEs and show that attribute decoding is important for node representation learning. We further propose a new learning model, interpretable NOde Representation with Attribute Decoding (NORAD). The model encodes node representations in an interpretable approach: node representations capture community structures in the graph and the relationship between communities and node attributes. We further propose a rectifying procedure to refine node representations of isolated notes, which improves the quality of the representations of these nodes. Our empirical results demonstrate the advantage of the proposed model when learning graph data in an interpretable approach.

PDF Details

NeurIPS Conference 2022 Conference Paper

Wasserstein $K$-means for clustering probability distributions

Yubo Zhuang
Xiaohui Chen
Yun Yang

Clustering is an important exploratory data analysis technique to group objects based on their similarity. The widely used $K$-means clustering method relies on some notion of distance to partition data into a fewer number of groups. In the Euclidean space, centroid-based and distance-based formulations of the $K$-means are equivalent. In modern machine learning applications, data often arise as probability distributions and a natural generalization to handle measure-valued data is to use the optimal transport metric. Due to non-negative Alexandrov curvature of the Wasserstein space, barycenters suffer from regularity and non-robustness issues. The peculiar behaviors of Wasserstein barycenters may make the centroid-based formulation fail to represent the within-cluster data points, while the more direct distance-based $K$-means approach and its semidefinite program (SDP) relaxation are capable of recovering the true cluster labels. In the special case of clustering Gaussian distributions, we show that the SDP relaxed Wasserstein $K$-means can achieve exact recovery given the clusters are well-separated under the $2$-Wasserstein metric. Our simulation and real data examples also demonstrate that distance-based $K$-means can achieve better classification performance over the standard centroid-based $K$-means for clustering probability distributions and images.

PDF Details

ICRA Conference 2021 Conference Paper

A Framework for Multisensory Foresight for Embodied Agents

Xiaohui Chen
Ramtin Hosseini
Karen Panetta
Jivko Sinapov

Predicting future sensory states is crucial for learning agents such as robots, drones, and autonomous vehicles. In this paper, we couple multiple sensory modalities with exploratory actions and propose a predictive neural network architecture to address this problem. Most existing approaches rely on large, manually annotated datasets, or only use visual data as a single modality. In contrast, the unsupervised method presented here uses multi-modal perceptions for predicting future visual frames. As a result, the proposed model is more comprehensive and can better capture the spatio-temporal dynamics of the environment, leading to more accurate visual frame prediction. The other novelty of our framework is the use of sub-networks dedicated to anticipating future haptic, audio, and tactile signals. The framework was tested and validated with a dataset containing 4 sensory modalities (vision, haptic, audio, and tactile) on a humanoid robot performing 9 behaviors multiple times on a large set of objects. While the visual information is the dominant modality, utilizing the additional non-visual modalities improves the accuracy of predictions.

Details

AAAI Conference 2021 Conference Paper

GAN Ensemble for Anomaly Detection

Xu Han
Xiaohui Chen
Li-Ping Liu

When formulated as an unsupervised learning problem, anomaly detection often requires a model to learn the distribution of normal data. Previous works modify Generative Adversarial Networks (GANs) by using encoder-decoders as generators and then apply them to anomaly detection tasks. Related studies also indicate that GAN ensembles are often more stable than single GANs in image generation tasks. In this work, we propose to construct GAN ensembles for anomaly detection. In this new method, a group of generators interact with a group of discriminators, so every generator gets feedback from every discriminator, and vice versa. Compared to a single GAN, an ensemble of GANs can better model the distribution of normal data and thus better detect anomalies. We also make a theoretical analysis of GANs and GAN ensembles in the context of anomaly detection. The empirical study constructs ensembles based on four different types of detecting models, and the results show that the ensemble outperforms the single model for all four model types.

PDF Details

ICML Conference 2021 Conference Paper

Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation

Xiaohui Chen
Xu Han 0012
Jiajing Hu
Francisco J. R. Ruiz
Liping Liu 0001

A graph generative model defines a distribution over graphs. Typically, the model consists of a sequential process that creates and adds nodes and edges. Such sequential process defines an ordering of the nodes in the graph. The computation of the model’s likelihood requires to marginalize the node orderings; this makes maximum likelihood estimation (MLE) challenging due to the (factorial) number of possible permutations. In this work, we provide an expression for the likelihood of a graph generative model and show that its calculation is closely related to the problem of graph automorphism. In addition, we derive a variational inference (VI) algorithm for fitting a graph generative model that is based on the maximization of a variational bound of the log-likelihood. This allows the model to be trained with node orderings from the approximate posterior instead of ad-hoc orderings. Our experiments show that our log-likelihood bound is significantly tighter than the bound of previous schemes. The models fitted with the VI algorithm are able to generate high-quality graphs that match the structures of target graphs not seen during training.

Details