Author name cluster

Xiaohui Xie

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

YNICL Journal 2026 Journal Article

Beyond the cerebral cortex: cerebellar language-related subregions contributions to fluency in post-stroke aphasia

Yuqian Zhan
Xiaohui Xie
Qiufang Ren
Xiaomin Pan
Zhishun Gao
Jin Li
Kai Wang
Tongjian Bai

Although the classical language cortex significantly contributes to post-stroke aphasia (PSA), non-language-specific cortex, such as the cerebellum, is increasingly implicated in language. However, the specific contributions of its subregions to PSA, particularly regarding distinct language dimensions, remain unclear. Given fluency as a core dimension, we investigated the functional and structural integrity of cerebellar language-related subregions to clarify their distinct roles in fluent (FA) versus non-fluent aphasia (nonFA). We enrolled a primary cohort of 81 PSA patients (46 nonFA, 35 FA), and 77 healthy controls (HCs), alongside an independent external validation cohort (Aphasia Recovery Cohort [ARC]; 23 nonFA, 22 FA). Using individualized functional connectivity (FC) and volumetric analyses based on the Multi-Domain Task Battery (MDTB) atlas, we found that nonFA patients exhibited significantly decreased FC between the classical language network (LN) and language-related cerebellar subregions (right MDTB 8 and 9; R_MDTB8/9-LN FC), alongside reduced right Crus II volume. Correlation analysis revealed that these neuroimaging indicators were positively associated with language scores in nonFA, while no such relationships were observed in FA. Furthermore, mediation analysis indicated that right Crus II volume statistically accounted for the observed association between R_MDTB8/9-LN FC and overall Aphasia Quotient (AQ). As the key findings were replicated in the ARC, our results provide compelling evidence that the functional connectivity strength and structural integrity of specific cerebellar subregions contribute to language fluency. Our findings support expanding models of PSA beyond cortical regions and suggest that cerebellar-targeted strategies may improve language rehabilitation outcomes.

Details DOI

AAAI Conference 2026 Conference Paper

CoMA: Compositional Human Motion Generation with Multi-modal Agents

Shanlin Sun
Jiaqi Xu
Gabriel de Araujo
Shenghan Zhou
Hanwen Zhang
Ziheng Huang
Chenyu You
Xiaohui Xie

3D human motion generation has seen substantial advancement in recent years. While state-of-the-art approaches have improved performance significantly, they still struggle with complex and detailed motions unseen in training data, largely due to the scarcity of motion datasets and the prohibitive cost of generating new training examples. To address these challenges, we introduce CoMA, an agent-based solution for complex human motion generation, editing, and comprehension. CoMA leverages multiple collaborative agents powered by large language and vision models, alongside a mask transformer-based motion generator featuring body part-specific encoders and codebooks for fine-grained control. Our framework enables generation of both short and long motion sequences with detailed instructions, text-guided motion editing, and self-correction for improved quality. Evaluations on the HumanML3D dataset demonstrate competitive performance against state-of-the-art methods. Additionally, we create a set of context-rich, compositional, and long text prompts, where user studies show our method significantly outperforms existing approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

CoRA: A Collaborative Robust Architecture with Hybrid Fusion for Efficient Perception

Gong Chen
Chaokun Zhang
Pengcheng Lv
Xiaohui Xie

Collaborative perception has garnered significant attention as a crucial technology to overcome the perceptual limitations of single-agent systems. Many state-of-the-art (SOTA) methods have achieved communication efficiency and high performance via intermediate fusion. However, they share a critical vulnerability: their performance degrades under adverse communication conditions due to the misalignment induced by data transmission, which severely hampers their practical deployment. To bridge this gap, we re-examine different fusion paradigms, and recover that the strengths of intermediate and late fusion are not a trade-off, but a complementary pairing. Based on this key insight, we propose CoRA, a novel collaborative robust architecture with a hybrid approach to decouple performance from robustness with low communication. It is composed of two components: a feature-level fusion branch and an object-level correction branch. Its first branch selects critical features and fuses them efficiently to ensure both performance and scalability. The second branch leverages semantic relevance to correct spatial displacements, guaranteeing resilience against pose errors. Experiments demonstrate the superiority of CoRA. Under extreme scenarios, CoRA improves upon its baseline performance by approximately 19% in [email protected] with more than 5x less communication volume, which makes it a promising solution for robust collaborative perception.

PDF Details DOI

AAAI Conference 2026 Conference Paper

OPLoRA: Orthogonal Projection LoRA Prevents Catastrophic Forgetting During Parameter-Efficient Fine-Tuning

Yifeng Xiong
Xiaohui Xie

Low-Rank Adaptation (LoRA) enables efficient fine-tuning of large language models but suffers from catastrophic forgetting when learned updates interfere with the dominant singular directions that encode essential pre-trained knowledge. We propose Orthogonal Projection LoRA (OPLoRA), a theoretically grounded approach that prevents this interference through double-sided orthogonal projections. By decomposing frozen weights via SVD, OPLoRA constrains LoRA updates to lie entirely within the orthogonal complement of the top-k singular subspace using projections PL = I − Uk Ukᵀ and PR = I − Vk Vkᵀ. We prove that this construction exactly preserves the top-k singular triples, providing mathematical guarantees for knowledge retention. To quantify subspace interference, we introduce ρk, a metric measuring update alignment with dominant directions. Extensive experiments across commonsense reasoning, mathematics, and code generation demonstrate that OPLoRA significantly reduces forgetting while maintaining competitive task-specific performance on LLaMA-2 7B and Qwen2.5 7B, establishing orthogonal projection as an effective mechanism for knowledge preservation in parameter-efficient fine-tuning.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

UMAMI: Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis

Thanh-Tung Le
Tuan Pham
Tung Nguyen
Deying Kong
Xiaohui Xie
Stephan Mandt

Novel view synthesis (NVS) seeks to render photorealistic, 3D‑consistent images of a scene from unseen camera poses given only a sparse set of posed views. Existing deterministic networks render observed regions quickly but blur unobserved areas, whereas stochastic diffusion‑based methods hallucinate plausible content yet incur heavy training‑ and inference‑time costs. In this paper, we propose a hybrid framework that unifies the strengths of both paradigms. A bidirectional transformer encodes multi‑view image tokens and Plücker‑ray embeddings, producing a shared latent representation. Two lightweight heads then act on this representation: (i) a feed‑forward regression head that renders pixels where geometry is well constrained, and (ii) a masked autoregressive diffusion head that completes occluded or unseen regions. The entire model is trained end‑to‑end with joint photometric and diffusion losses, without handcrafted 3D inductive biases, enabling scalability across diverse scenes. Experiments demonstrate that our method attains state‑of‑the‑art image quality while reducing rendering time by an order of magnitude compared with fully generative baselines.

PDF Details

ICLR Conference 2024 Conference Paper

Diffeomorphic Mesh Deformation via Efficient Optimal Transport for Cortical Surface Reconstruction

Thanh-Tung Le
Khai Nguyen
Shanlin Sun
Kun Han
Nhat Ho
Xiaohui Xie

Mesh deformation plays a pivotal role in many 3D vision tasks including dynamic simulations, rendering, and reconstruction. However, defining an efficient discrepancy between predicted and target meshes remains an open problem. A prevalent approach in current deep learning is the set-based approach which measures the discrepancy between two surfaces by comparing two randomly sampled point-clouds from the two meshes with Chamfer pseudo-distance. Nevertheless, the set-based approach still has limitations such as lacking a theoretical guarantee for choosing the number of points in sampled point-clouds, and the pseudo-metricity and the quadratic complexity of the Chamfer divergence. To address these issues, we propose a novel metric for learning mesh deformation. The metric is defined by sliced Wasserstein distance on meshes represented as probability measures that generalize the set-based approach. By leveraging probability measure space, we gain flexibility in encoding meshes using diverse forms of probability measures, such as continuous, empirical, and discrete measures via \textit{varifold} representation. After having encoded probability measures, we can compare meshes by using the sliced Wasserstein distance which is an effective optimal transport distance with linear computational complexity and can provide a fast statistical rate for approximating the surface of meshes. To the end, we employ a neural ordinary differential equation (ODE) to deform the input surface into the target shape by modeling the trajectories of the points on the surface. Our experiments on cortical surface reconstruction demonstrate that our approach surpasses other competing methods in multiple datasets and metrics.

Details

AAAI Conference 2023 Conference Paper

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

Zhenglun Kong
Haoyu Ma
Geng Yuan
Mengshu Sun
Yanyue Xie
Peiyan Dong
Xin Meng
Xuan Shen

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each example, and number of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT. Our code is released at https://github.com/ZLKong/Tri-Level-ViT

PDF Details DOI

ICRA Conference 2021 Conference Paper

Test-Time Training for Deformable Multi-Scale Image Registration

Wentao Zhu 0001
Yufang Huang
Daguang Xu
Zhen Qian
Wei Fan 0001
Xiaohui Xie

Registration is a fundamental task in medical robotics and is often a crucial step for many downstream tasks such as motion analysis, intra-operative tracking and image segmentation. Popular registration methods such as ANTs and NiftyReg optimize objective functions for each pair of images from scratch, which are time-consuming for 3D and sequential images with complex deformations. Recently, deep learning-based registration approaches such as VoxelMorph have been emerging and achieve competitive performance. In this work, we construct a test-time training for deep deformable image registration to improve the generalization ability of conventional learning-based registration model. We design multi-scale deep networks to consecutively model the residual deformations, which is effective for high variational deformations. Extensive experiments validate the effectiveness of multi-scale deep registration with test-time training based on Dice coefficient for image segmentation and mean square error (MSE), normalized local cross-correlation (NLCC) for tissue dense tracking tasks.

Details

ICLR Conference 2021 Conference Paper

Undistillable: Making A Nasty Teacher That CANNOT teach students

Haoyu Ma
Tianlong Chen 0001
Ting-Kuei Hu
Chenyu You
Xiaohui Xie
Zhangyang Wang

Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-trained teacher models to (usually more lightweight) student models. However, in certain situations, this technique is more of a curse than a blessing. For instance, KD poses a potential risk of exposing intellectual properties (IPs): even if a trained machine learning model is released in ``black boxes'' (e.g., as executable software or APIs without open-sourcing code), it can still be replicated by KD through imitating input-output behaviors. To prevent this unwanted effect of KD, this paper introduces and investigates a concept called $\textit{Nasty Teacher}$: a specially trained teacher network that yields nearly the same performance as a normal one, but would significantly degrade the performance of student models learned by imitating it. We propose a simple yet effective algorithm to build the nasty teacher, called $\textit{self-undermining knowledge distillation}$. Specifically, we aim to maximize the difference between the output of the nasty teacher and a normal pre-trained network. Extensive experiments on several datasets demonstrate that our method is effective on both standard KD and data-free KD, providing the desirable KD-immunity to model owners for the first time. We hope our preliminary study can draw more awareness and interest in this new practical problem of both social and legal importance. Our codes and pre-trained models can be found at: $\url{https://github.com/VITA-Group/Nasty-Teacher}$.

Details

ICLR Conference 2020 Conference Paper

Dynamically Pruned Message Passing Networks for Large-scale Knowledge Graph Reasoning

Xiaoran Xu
Wei Feng
Yunsheng Jiang
Xiaohui Xie
Zhiqing Sun
Zhi-Hong Deng 0001

We propose Dynamically Pruned Message Passing Networks (DPMPN) for large-scale knowledge graph reasoning. In contrast to existing models, embedding-based or path-based, we learn an input-dependent subgraph to explicitly model a sequential reasoning process. Each subgraph is dynamically constructed, expanding itself selectively under a flow-style attention mechanism. In this way, we can not only construct graphical explanations to interpret prediction, but also prune message passing in Graph Neural Networks (GNNs) to scale with the size of graphs. We take the inspiration from the consciousness prior proposed by Bengio to design a two-GNN framework to encode global input-invariant graph-structured representation and learn local input-dependent one coordinated by an attention module. Experiments show the reasoning capability in our model that is providing a clear graphical explanation as well as predicting results accurately, outperforming most state-of-the-art methods in knowledge base completion tasks.

Details

AAAI Conference 2016 Conference Paper

Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks

Wentao Zhu
Cuiling Lan
Junliang Xing
Wenjun Zeng
Yanghao Li
Li Shen
Xiaohui Xie

Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Term Memory (LSTM) can learn feature representations and model long-term temporal dependencies automatically, we propose an endto-end fully connected deep LSTM network for skeleton based action recognition. Inspired by the observation that the co-occurrences of the joints intrinsically characterize human actions, we take the skeleton as the input at each time slot and introduce a novel regularization scheme to learn the co-occurrence features of skeleton joints. To train the deep LSTM network effectively, we propose a new dropout algorithm which simultaneously operates on the gates, cells, and output responses of the LSTM neurons. Experimental results on three human action recognition datasets consistently demonstrate the effectiveness of the proposed model.

PDF Details

YNIMG Journal 2010 Journal Article

Identifying gene regulatory networks in schizophrenia

Steven G. Potkin
Fabio Macciardi
Guia Guffanti
James H. Fallon
Qi Wang
Jessica A. Turner
Anita Lakatos
Michael F. Miles

The imaging genetics approach to studying the genetic basis of disease leverages the individual strengths of both neuroimaging and genetic studies by visualizing and quantifying the brain activation patterns in the context of genetic background. Brain imaging as an intermediate phenotype can help clarify the functional link among genes, the molecular networks in which they participate, and brain circuitry and function. Integrating genetic data from a genome-wide association study (GWAS) with brain imaging as a quantitative trait (QT) phenotype can increase the statistical power to identify risk genes. A QT analysis using brain imaging (DLPFC activation during a working memory task) as a quantitative trait has identified unanticipated risk genes for schizophrenia. Several of these genes (RSRC1, ARHGAP18, ROBO1-ROBO2, GPC1, TNIK, and CTXN3-SLC12A2) have functions related to progenitor cell proliferation, migration, and differentiation, cytoskeleton reorganization, axonal connectivity, and development of forebrain structures. These genes, however, do not function in isolation but rather through gene regulatory networks. To obtain a deeper understanding how the GWAS-identified genes participate in larger gene regulatory networks, we measured correlations among transcript levels in the mouse and human postmortem tissue and performed a gene set enrichment analysis (GSEA) that identified several microRNA associated with schizophrenia (448, 218, 137). The results of such computational approaches can be further validated in animal experiments in which the networks are experimentally studied and perturbed with specific compounds. Glypican 1 and FGF17 mouse models for example, can be used to study such gene regulatory networks. The model demonstrates epistatic interactions between FGF and glypican on brain development and may be a useful model of negative symptom schizophrenia.

Details DOI

NeurIPS Conference 2003 Conference Paper

Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

Justin Werfel
Xiaohui Xie
H. Seung

Dept. of Brain & Cog. Sci. Cambridge, MA 02139

PDF Details

NeurIPS Conference 2001 Conference Paper

A theory of neural integration in the head-direction system

Richard Hahnloser
Xiaohui Xie
H. Seung

Integration in the head-direction system is a computation by which hor- izontal angular head velocity signals from the vestibular nuclei are in- tegrated to yield a neural representation of head direction. In the thala- mus, the postsubiculum and the mammillary nuclei, the head-direction representation has the form of a place code: neurons have a preferred head direction in which their ﬁring is maximal [Blair and Sharp, 1995, Blair et al. , 1998, ?]. Integration is a difﬁcult computation, given that head-velocities can vary over a large range. Previous models of the head-direction system relied on the assumption that the integration is achieved in a ﬁring-rate-based attractor network with a ring structure. In order to correctly integrate head-velocity signals during high-speed head rotations, very fast synaptic dynamics had to be assumed. Here we address the question whether integration in the head-direction system is possible with slow synapses, for example excitatory NMDA and inhibitory GABA(B) type synapses. For neural networks with such slow synapses, rate-based dynamics are a good approximation of spik- ing neurons [Ermentrout, 1994]. We ﬁnd that correct integration during high-speed head rotations imposes strong constraints on possible net- work architectures.

PDF Details

NeurIPS Conference 2001 Conference Paper

Generating velocity tuning by asymmetric recurrent connections

Xiaohui Xie
Martin Giese

Asymmetric lateral connections are one possible mechanism that can ac- count for the direction selectivity of cortical neurons. We present a math- ematical analysis for a class of these models. Contrasting with earlier theoretical work that has relied on methods from linear systems theory, we study the network’s nonlinear dynamic properties that arise when the threshold nonlinearity of the neurons is taken into account. We show that such networks have stimulus-locked traveling pulse solutions that are appropriate for modeling the responses of direction selective cortical neurons. In addition, our analysis shows that outside a certain regime of stimulus speeds the stability of this solutions breaks down giving rise to another class of solutions that are characterized by speciﬁc spatio- temporal periodicity. This predicts that if direction selectivity in the cor- tex is mainly achieved by asymmetric lateral connections lurching activ- ity waves might be observable in ensembles of direction selective cortical neurons within appropriate regimes of the stimulus speed.

PDF Details

NeurIPS Conference 2000 Conference Paper

Learning Winner-take-all Competition Between Groups of Neurons in Lateral Inhibitory Networks

Xiaohui Xie
Richard Hahnloser
H. Sebastian Seung

It has long been known that lateral inhibition in neural networks can lead to a winner-take-all competition, so that only a single neuron is active at a steady state. Here we show how to organize lateral inhibition so that groups of neurons compete to be active. Given a collection of poten(cid: 173) tially overlapping groups, the inhibitory connectivity is set by a formula that can be interpreted as arising from a simple learning rule. Our analy(cid: 173) sis demonstrates that such inhibition generally results in winner-take-all competition between the given groups, with the exception of some de(cid: 173) generate cases. In a broader context, the network serves as a particular illustration of the general distinction between permitted and forbidden sets, which was introduced recently. From this viewpoint, the computa(cid: 173) tional function of our network is to store and retrieve memories as per(cid: 173) mitted sets of coactive neurons. In traditional winner-take-all networks, lateral inhibition is used to enforce a localized, or "grandmother cell" representation in which only a single neuron is active [1, 2, 3, 4]. When used for unsupervised learning, winner-take-all networks discover representations similar to those learned by vector quantization [5]. Recently many research efforts have focused on unsupervised learning algorithms for sparsely distributed representations [6, 7]. These algorithms lead to networks in which groups of multiple neurons are coactivated to represent an object. Therefore, it is of great interest to find ways of using lateral inhibition to mediate winner-take-all competition between groups of neurons, as this could be useful for learning sparsely distributed representations. In this paper, we show how winner-take-all competition between groups of neurons can be learned. Given a collection of potentially overlapping groups, the inhibitory connectivity is set by a simple formula that can be interpreted as arising from an online learning rule. To show that the resulting network functions as advertised, we perform a stability analysis. If the strength of inhibition is sufficiently great, and the group organization satisfies certain conditions, we show that the only sets of neurons that can be coactivated at a stable steady state are the given groups and their subsets. Because of the competition between groups, only one group can be activated at a time. In general, the identity of the winning group depends on the initial conditions of the network dynamics. If the groups are ordered by the aggregate input that each receives, the possible winners are those above a cutoff that is set by inequalities to be specified. 1 Basic definitions Let m groups of neurons be given, where group membership is specified by the matrix fl = {I if the ith neuron is in the ath group, ° otherwise (1) We will assume that every neuron belongs to at least one group l, and every group contains at least one neuron. A neuron is allowed to belong to more than one group, so that the groups are potentially overlapping. The inhibitory synaptic connectivity of the network is defined in terms of the group membership, Ji ' = lIm (1 _ ~a ~'! ) = {o

PDF Details

NeurIPS Conference 1999 Conference Paper

Spike-based Learning Rules and Stabilization of Persistent Neural Activity

Xiaohui Xie
H. Sebastian Seung

We analyze the conditions under which synaptic learning rules based on action potential timing can be approximated by learning rules based on firing rates. In particular, we consider a form of plasticity in which synapses depress when a presynaptic spike is followed by a postsynaptic spike, and potentiate with the opposite temporal ordering. Such differen(cid: 173) tial anti-Hebbian plasticity can be approximated under certain conditions by a learning rule that depends on the time derivative of the postsynaptic firing rate. Such a learning rule acts to stabilize persistent neural activity patterns in recurrent neural networks.

PDF Details