Arrow Research search

Author name cluster

Xiao Cai

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
1 author row

Possible papers

6

AAAI Conference 2026 Conference Paper

RCMoE: A Communication-Efficient Random Compression Framework for Resource-Constrained Mixture-of-Experts Training

  • Donglei Wu
  • Xiao Cai
  • Jinglei Tan
  • Jinda Jia
  • Guangming Tan
  • Dingwen Tao
  • Wen Xia
  • Zhihong Tian

Mixture-of-Experts (MoE) architecture with experts parallelism scales LLMs efficiently by activating only a subset of experts per input, avoiding proportional training costs. However, the intensive and heterogeneous communication substantially hinders the efficiency and scalability of MoE training in the resource-constrained scenario. Existing communication compression techniques fall short in MoE training due to: (i) Intensive training amplifies compression overhead, compromising training efficiency; (ii) Accumulated compression errors propagate through the network, degrading training quality. In this paper, we propose RCMoE, a communication-efficient Random Compression framework for MoE training with two core modules: (1) Local-Stochastic Quantization compresses the all-to-all communication by stochastically quantizing each row of the expert's intermediate computing results in parallel, effectively improving the compression efficiency and reducing compression error; (2) Probabilistic Thresholding Sparsification compresses the all-reduce communication by probabilistically sampling large gradients at high probability, thereby reducing the computational complexity and maintaining the convergence efficiency. Experiments on four typical MoE training tasks prove that RCMoE achieves higher 5.9x-8.1x total communication compression ratios and 1.3x-10.1x training speedup compared with the state-of-the-art compression techniques while maintaining the MoE training accuracy.

AAAI Conference 2015 Conference Paper

A Closed Form Solution to Multi-View Low-Rank Regression

  • Shuai Zheng
  • Xiao Cai
  • Chris Ding
  • Feiping Nie
  • Heng Huang

Real life data often includes information from different channels. For example, in computer vision, we can describe an image using different image features, such as pixel intensity, color, HOG, GIST feature, SIFT features, etc. . These different aspects of the same objects are often called multi-view (or multi-modal) data. Lowrank regression model has been proved to be an effective learning mechanism by exploring the low-rank structure of real life data. But previous low-rank regression model only works on single view data. In this paper, we propose a multi-view low-rank regression model by imposing low-rank constraints on multi-view regression model. Most importantly, we provide a closed-form solution to the multi-view low-rank regression model. Extensive experiments on 4 multi-view datasets show that the multi-view low-rank regression model outperforms single-view regression model and reveals that multiview low-rank structure is very helpful.

IJCAI Conference 2013 Conference Paper

Exact Top-k Feature Selection via l2, 0-Norm Constraint

  • Xiao Cai
  • Feiping Nie
  • Heng Huang

In this paper, we propose a novel robust and pragmatic feature selection approach. Unlike those sparse learning based feature selection methods which tackle the approximate problem by imposing sparsity regularization in the objective function, the proposed method only has one `2, 1-norm loss term with an explicit `2, 0-Norm equality constraint. An efficient algorithm based on augmented Lagrangian method will be derived to solve the above constrained optimization problem to find out the stable local solution. Extensive experiments on four biological datasets show that although our proposed model is not a convex problem, it outperforms the approximate convex counterparts and state-ofart feature selection methods evaluated in terms of classification accuracy by two popular classifiers. What is more, since the regularization parameter of our method has the explicit meaning, i. e. the number of feature selected, it avoids the burden of tuning the parameter, making it a pragmatic feature selection method.

IJCAI Conference 2013 Conference Paper

Multi-View K-Means Clustering on Big Data

  • Xiao Cai
  • Feiping Nie
  • Heng Huang

In past decade, more and more data are collected from multiple sources or represented by multiple views, where different views describe distinct perspectives of the data. Although each view could be individually used for finding patterns by clustering, the clustering performance could be more accurate by exploring the rich information among multiple views. Several multi-view clustering methods have been proposed to unsupervised integrate different views of data. However, they are graph based approaches, e. g. based on spectral clustering, such that they cannot handle the large-scale data. How to combine these heterogeneous features for unsupervised large-scale data clustering has become a challenging problem. In this paper, we propose a new robust large-scale multi-view clustering method to integrate heterogeneous representations of largescale data. We evaluate the proposed new methods by six benchmark data sets and compared the performance with several commonly used clustering approaches as well as the baseline multi-view clustering methods. In all experimental results, our proposed methods consistently achieve superiors clustering performances.

NeurIPS Conference 2010 Conference Paper

Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization

  • Feiping Nie
  • Heng Huang
  • Xiao Cai
  • Chris Ding

Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint ℓ2, 1-norm minimization on both loss function and regularization. The ℓ2, 1-norm based loss function is robust to outliers in data points and the ℓ2, 1-norm regularization selects features across all data points with joint sparsity. An efficient algorithm is introduced with proved convergence. Our regression based objective makes the feature selection process more efficient. Our method has been applied into both genomic and proteomic biomarkers discovery. Extensive empirical studies were performed on six data sets to demonstrate the effectiveness of our feature selection method.