Arrow Research search

Author name cluster

Chen Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers
2 author rows

Possible papers

28

AAAI Conference 2026 Conference Paper

WaveEx: Accelerating Flow Matching-based Speech Generation via Wavelet-guided Extrapolation

  • Xiaoqian Liu
  • Xiyan Gui
  • Zhengkun Ge
  • Yuan Ge
  • Chang Zou
  • Jiacheng Liu
  • Zhikang Niu
  • Qixi Zheng

Flow matching-based generative models offer a principled approach to modeling continuous-time dynamics in speech generation. However, inference is often computationally expensive due to repeated neural network evaluations required by ODE solvers. We propose WaveEx, a training-free and plug-in acceleration framework which replaces portions of ODE integration with wavelet-guided extrapolation. By leveraging the multi-scale structure of latent trajectories, WaveEx predicts future states directly in the frequency domain without additional model evaluations or architectural changes. WaveEx consistently accelerates inference across diverse speech generation tasks. The gains are especially pronounced in tasks like speech synthesis (up to 5.73× speedup) and music generation (2.75×), where flow matching plays a central role in alignment modeling and dense ODE integration. Even in tasks with simpler input-output mappings such as speech enhancement (4.55×) and voice conversion (2.75×), WaveEx still achieves notable acceleration, demonstrating the robustness and generalizability of the approach. These results highlight wavelet-guided extrapolation as a lightweight and broadly applicable alternative to full ODE solving for flow matching-based speech generation.

ICML Conference 2025 Conference Paper

Global Optimization with a Power-Transformed Objective and Gaussian Smoothing

  • Chen Xu

We propose a novel method, namely Gaussian Smoothing with a Power-Transformed Objective (GS-PowerOpt), that solves global optimization problems in two steps: (1) perform a (exponential) power-$N$ transformation to the not necessarily differentiable objective $f: \mathbb{R}^d\rightarrow \mathbb{R}$ and get $f_N$, and (2) optimize the Gaussian-smoothed $f_N$ with stochastic approximations. Under mild conditions on $f$, for any $\delta>0$, we prove that with a sufficiently large power $N_\delta$, this method converges to a solution in the $\delta$-neighborhood of $f$’s global optimum point, at the iteration complexity of $O(d^4\varepsilon^{-2})$. If we require that $f$ is differentiable and further assume the Lipschitz condition on $f$ and its gradient, the iteration complexity reduces to $O(d^2\varepsilon^{-2})$, which is significantly faster than the standard homotopy method. In most of the experiments performed, our method produces better solutions than other algorithms that also apply the smoothing technique.

ICLR Conference 2025 Conference Paper

Kernel-based Optimally Weighted Conformal Time-Series Prediction

  • Jonghyeok Lee
  • Chen Xu
  • Yao Xie 0002

Conformal prediction has been a popular distribution-free framework for uncertainty quantification. In this work, we present a novel conformal prediction method for time-series, which we call Kernel-based Optimally Weighted Conformal Prediction Intervals ($\texttt{KOWCPI}$). Specifically, $\texttt{KOWCPI}$ adapts the classic Reweighted Nadaraya-Watson (RNW) estimator for quantile regression on dependent data and learns optimal data-adaptive weights. Theoretically, we tackle the challenge of establishing a conditional coverage guarantee for non-exchangeable data under strong mixing conditions on the non-conformity scores. We demonstrate the superior performance of $\texttt{KOWCPI}$ on real time-series against state-of-the-art methods, where $\texttt{KOWCPI}$ achieves narrower confidence intervals without losing coverage.

ICLR Conference 2025 Conference Paper

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods

  • Zukang Xu
  • Yuxuan Yue
  • Xing Hu 0010
  • Dawei Yang
  • Zhihang Yuan
  • Zixu Jiang
  • Zhixuan Chen
  • Jiangyong Yu

Mamba is an efficient sequence model that rivals Transformers and demonstrates significant potential as a foundational architecture for various tasks. Quantization is commonly used in neural networks to reduce model size and computational latency. However, applying quantization to Mamba remains underexplored, and existing quantization methods, which have been effective for CNN and Transformer models, appear inadequate for Mamba models (e.g., Quarot suffers a 21% accuracy drop on Vim-T$\dagger$ even under W8A8). We have pioneered the exploration of this issue and identified several key challenges. First, significant outliers arepresent in gate projections, output projections, and matrix multiplications. Second, Mamba’s unique parallel scan further amplifies these outliers, leading to uneven and heavy-tailed data distributions. Third, even with the application of the Hadamard transform, the variance across channels in weights and activations still remains inconsistent. To these ends, we propose MambaQuant, a post-training quantization (PTQ) framework consisting of: 1) Karhunen-Lo`eve Transformation (KLT) enhanced rotation, rendering the rotation matrix adaptable to diverse channel distributions. 2) Smooth-Fused rotation, which equalizes channel variances and can merge additional parameters into model weights. Experiments show that MambaQuant can quantize both weights and activations into 8-bit with less than 1% accuracy loss for Mamba-based vision and language tasks. To our knowledge, MambaQuant is the first comprehensive PTQ design for the Mamba family, paving the way for further advancements in its application.

ICML Conference 2025 Conference Paper

MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance

  • Zhixuan Chen
  • Xing Hu 0010
  • Dawei Yang
  • Zukang Xu
  • Chen Xu
  • Zhihang Yuan
  • Sifan Zhou
  • Jiangyong Yu

Mixture-of-Experts (MoE) large language models (LLMs), which leverage dynamic routing and sparse activation to enhance efficiency and scalability, have achieved higher performance while reducing computational costs. However, these models face significant memory overheads, limiting their practical deployment and broader adoption. Post-training quantization (PTQ), a widely used method for compressing LLMs, encounters severe accuracy degradation and diminished generalization performance when applied to MoE models. This paper investigates the impact of MoE’s sparse and dynamic characteristics on quantization and identifies two primary challenges: (1) Inter-expert imbalance, referring to the uneven distribution of samples across experts, which leads to insufficient and biased calibration for less frequently utilized experts; (2) Intra-expert imbalance, arising from MoE’s unique aggregation mechanism, which leads to varying degrees of correlation between different samples and their assigned experts. To address these challenges, we propose MoEQuant, a novel quantization framework tailored for MoE LLMs. MoEQuant includes two novel techniques: 1) Expert-Balanced Self-Sampling (EBSS) is an efficient sampling method that efficiently constructs a calibration set with balanced expert distributions by leveraging the cumulative probabilities of tokens and expert balance metrics as guiding factors. 2) Affinity-Guided Quantization (AGQ), which incorporates affinities between experts and samples into the quantization process, thereby accurately assessing the impact of individual samples on different experts within the MoE layer. Experiments demonstrate that MoEQuant achieves substantial performance gains (more than 10 points accuracy gain in the HumanEval for DeepSeekMoE-16B under 4-bit quantization) and boosts efficiency.

ICLR Conference 2025 Conference Paper

OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting

  • Xing Hu 0010
  • Yuan Cheng
  • Dawei Yang
  • Zhixuan Chen
  • Zukang Xu
  • Jiangyong Yu
  • Chen Xu
  • Zhihang Yuan

Post-training quantization (PTQ) has emerged as a widely adopted technique for compressing and accelerating Large Language Models (LLMs). The major challenge in LLM quantization is that uneven and heavy-tailed data distributions can expand the quantization range, thereby reducing bit precision for most values. Recent methods attempt to eliminate outliers and balance inter-channel differences by employing linear transformations; however, they remain heuristic and are often overlook optimizing the data distribution across the entire quantization space. In this paper, we introduce Quantization Space Utilization Rate (QSUR), a novel metric that effectively assesses the quantizability of transformed data by measuring the space utilization of the data in the quantization space. We complement QSUR with mathematical derivations that examine the effects and limitations of various transformations, guiding our development of Orthogonal and Scaling Transformation-based Quantization (OSTQuant). OSTQuant employs a learnable equivalent transformation, consisting of an orthogonal transformation and a scaling transformation, to optimize the distributions of weights and activations across the entire quantization space. Futhermore, we propose the KL-Top loss function, designed to mitigate noise during optimization while retaining richer semantic information within the limited calibration data imposed by PTQ. OSTQuant outperforms existing work on various LLMs and benchmarks. In the W4-only setting, it retains 99.5\% of the floating-point accuracy. In the more challenging W4A4KV4 configuration, OSTQuant reduces the performance gap by 32\% on the LLaMA-3-8B model compared to state-of-the-art methods. Code will be available.

ICML Conference 2025 Conference Paper

RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization

  • Chen Xu
  • Yuxuan Yue
  • Zukang Xu
  • Xing Hu 0010
  • Jiangyong Yu
  • Zhixuan Chen
  • Sifan Zhou
  • Zhihang Yuan

RWKV is a modern RNN architecture with comparable performance to Transformer, but still faces challenges when deployed to resource-constrained devices. Post Training Quantization (PTQ), which is a an essential technique to reduce model size and inference latency, has been widely used in Transformer models. However, it suffers significant degradation of performance when applied to RWKV. This paper investigates and identifies two key constraints inherent in the properties of RWKV: (1) Non-linear operators hinder the parameter-fusion of both smooth- and rotation-based quantization, introducing extra computation overhead. (2) The larger amount of uniformly distributed weights poses challenges for cluster-based quantization, leading to reduced accuracy. To this end, we propose RWKVQuant, a PTQ framework tailored for RWKV models, consisting of two novel techniques: (1) a coarse-to-fine proxy capable of adaptively selecting different quantization approaches by assessing the uniformity and identifying outliers in the weights, and (2) a codebook optimization algorithm that enhances the performance of cluster-based quantization methods for element-wise multiplication in RWKV. Experiments show that RWKVQuant can quantize RWKV-6-14B into about 3-bit with less than 1% accuracy loss and 2. 14$\times$ speed up.

AAAI Conference 2024 Conference Paper

Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning

  • Tao Liu
  • Yuhang Zhang
  • Zhu Feng
  • Zhiqin Yang
  • Chen Xu
  • Dapeng Man
  • Wu Yang

Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted, we propose a Full Combination Backdoor Attack (FCBA) method. It aggregates more combined trigger information for a more complete backdoor pattern in the global model. Trained backdoored global model is more resilient to benign updates, leading to a higher attack success rate on the test set. We test on three datasets and evaluate with two models across various settings. FCBA's persistence outperforms SOTA federated learning backdoor attacks. On GTSRB, post-attack 120 rounds, our attack success rate rose over 50% from baseline. The core code of our method is available at https://github.com/PhD-TaoLiu/FCBA.

ICML Conference 2024 Conference Paper

Conformal prediction for multi-dimensional time series by ellipsoidal sets

  • Chen Xu
  • Hanyang Jiang
  • Yao Xie 0002

Conformal prediction (CP) has been a popular method for uncertainty quantification because it is distribution-free, model-agnostic, and theoretically sound. For forecasting problems in supervised learning, most CP methods focus on building prediction intervals for univariate responses. In this work, we develop a sequential CP method called $\texttt{MultiDimSPCI}$ that builds prediction $\textit{regions}$ for a multivariate response, especially in the context of multivariate time series, which are not exchangeable. Theoretically, we estimate $\textit{finite-sample}$ high-probability bounds on the conditional coverage gap. Empirically, we demonstrate that $\texttt{MultiDimSPCI}$ maintains valid coverage on a wide range of multivariate time series while producing smaller prediction regions than CP and non-CP baselines.

NeurIPS Conference 2024 Conference Paper

CriticEval: Evaluating Large-scale Language Model as Critic

  • Tian Lan
  • Wenwei Zhang
  • Chen Xu
  • Heyan Huang
  • Dahua Lin
  • Kai Chen
  • Xian-Ling Mao

Critique ability, i. e. , the capability of Large Language Models (LLMs) to identify and rectify flaws in responses, is crucial for their applications in self-improvement and scalable oversight. While numerous studies have been proposed to evaluate critique ability of LLMs, their comprehensiveness and reliability are still limited. To overcome this problem, we introduce CriticEval, a novel benchmark designed to comprehensively and reliably evaluate critique ability of LLMs. Specifically, to ensure the comprehensiveness, CriticEval evaluates critique ability from four dimensions across nine diverse task scenarios. It evaluates both scalar-valued and textual critiques, targeting responses of varying quality. To ensure the reliability, a large number of critiques are annotated to serve as references, enabling GPT-4 to evaluate textual critiques reliably. Extensive evaluations of open-source and closed-source LLMs first validate the reliability of evaluation in CriticEval. Then, experimental results demonstrate the promising potential of open-source LLMs, the effectiveness of critique datasets and several intriguing relationships between the critique ability and some critical factors, including task types, response qualities and critique dimensions.

IJCAI Conference 2024 Conference Paper

Recent Advances in End-to-End Simultaneous Speech Translation

  • Xiaoqian Liu
  • Guoqiang Hu
  • Yangfan Du
  • Erfeng He
  • Yingfeng Luo
  • Chen Xu
  • Tong Xiao
  • Jingbo Zhu

Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles. Secondly, satisfying real-time requirements presents inherent difficulties due to the need for immediate translation output. Thirdly, striking a balance between translation quality and latency constraints remains a critical challenge. Finally, the scarcity of annotated data adds another layer of complexity to the task. Through our exploration of these challenges and the proposed solutions, we aim to provide valuable insights into the current landscape of SimulST research and suggest promising directions for future exploration.

NeurIPS Conference 2024 Conference Paper

Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise

  • Zhenning Shi
  • Haoshuai Zheng
  • Chen Xu
  • Changsheng Dong
  • Bin Pan
  • Xueshuo Xie
  • Along He
  • Tao Li

Recently, research on denoising diffusion models has expanded its application to the field of image restoration. Traditional diffusion-based image restoration methods utilize degraded images as conditional input to effectively guide the reverse generation process, without modifying the original denoising diffusion process. However, since the degraded images already include low-frequency information, starting from Gaussian white noise will result in increased sampling steps. We propose Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images. The form of our inference process is consistent with the DDPM. We introduced a weighted residual noise, named resnoise, as the prediction target and explicitly provide the quantitative relationship between the residual term and the noise term in resnoise. By leveraging a smooth equivalence transformation, Resfusion determine the optimal acceleration step and maintains the integrity of existing noise schedules, unifying the training and inference processes. The experimental results demonstrate that Resfusion exhibits competitive performance on ISTD dataset, LOL dataset and Raindrop dataset with only five sampling steps. Furthermore, Resfusion can be easily applied to image generation and emerges with strong versatility. Our code and model are available at https: //github. com/nkicsl/Resfusion.

AAAI Conference 2023 Conference Paper

A Generative Approach for Script Event Prediction via Contrastive Fine-Tuning

  • Fangqi Zhu
  • Jun Gao
  • Changlong Yu
  • Wei Wang
  • Chen Xu
  • Xin Mu
  • Min Yang
  • Ruifeng Xu

Script event prediction aims to predict the subsequent event given the context. This requires the capability to infer the correlations between events. Recent works have attempted to improve event correlation reasoning by using pretrained language models and incorporating external knowledge (e.g., discourse relations). Though promising results have been achieved, some challenges still remain. First, the pretrained language models adopted by current works ignore event-level knowledge, resulting in an inability to capture the correlations between events well. Second, modeling correlations between events with discourse relations is limited because it can only capture explicit correlations between events with discourse markers, and cannot capture many implicit correlations. To this end, we propose a novel generative approach for this task, in which a pretrained language model is fine-tuned with an event-centric pretraining objective and predicts the next event within a generative paradigm. Specifically, we first introduce a novel event-level blank infilling strategy as the learning objective to inject event-level knowledge into the pretrained language model, and then design a likelihood-based contrastive loss for fine-tuning the generative model. Instead of using an additional prediction layer, we perform prediction by using sequence likelihoods generated by the generative model. Our approach models correlations between events in a soft way without any external knowledge. The likelihood-based prediction eliminates the need to use additional networks to make predictions and is somewhat interpretable since it scores each word in the event. Experimental results on the multi-choice narrative cloze (MCNC) task demonstrate that our approach achieves better results than other state-of-the-art baselines. Our code will be available at https://github.com/zhufq00/mcnc.

AAAI Conference 2023 Conference Paper

Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data

  • Yuhao Zhang
  • Chen Xu
  • Bojie Hu
  • Chunliang Zhang
  • Tong Xiao
  • Jingbo Zhu

We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a denoising method to build a robust text encoder that can deal with both normal and noisy text data. Our system sets new state-of-the-arts on the MuST-C En-De, En-Fr, and LibriSpeech En-Fr tasks.

NeurIPS Conference 2023 Conference Paper

Normalizing flow neural networks by JKO scheme

  • Chen Xu
  • Xiuyuan Cheng
  • Yao Xie

Normalizing flow is a class of deep generative models for efficient sampling and likelihood estimation, which achieves attractive performance, particularly in high dimensions. The flow is often implemented using a sequence of invertible residual blocks. Existing works adopt special network architectures and regularization of flow trajectories. In this paper, we develop a neural ODE flow network called JKO-iFlow, inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which unfolds the discrete-time dynamic of the Wasserstein gradient flow. The proposed method stacks residual blocks one after another, allowing efficient block-wise training of the residual blocks, avoiding sampling SDE trajectories and score matching or variational learning, thus reducing the memory load and difficulty in end-to-end training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the induced trajectory in probability space to improve the model accuracy further. Experiments with synthetic and real data show that the proposed JKO-iFlow network achieves competitive performance compared with existing flow and diffusion models at a significantly reduced computational and memory cost.

IJCAI Conference 2023 Conference Paper

Recent Advances in Direct Speech-to-text Translation

  • Chen Xu
  • Rong Ye
  • Qianqian Dong
  • Chengqi Zhao
  • Tom Ko
  • Mingxuan Wang
  • Tong Xiao
  • Jingbo Zhu

Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly. In this paper, we present a comprehensive survey on direct speech translation aiming to summarize the current state-of-the-art techniques. First, we categorize the existing research work into three directions based on the main challenges --- modeling burden, data scarcity, and application issues. To tackle the problem of modeling burden, two main structures have been proposed, encoder-decoder framework (Transformer and the variants) and multitask frameworks. For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling. We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching. Finally, we discuss some promising directions for future work.

ICML Conference 2023 Conference Paper

Sequential Predictive Conformal Inference for Time Series

  • Chen Xu
  • Yao Xie 0002

We present a new distribution-free conformal prediction algorithm for sequential data (e. g. , time series), called the sequential predictive conformal inference (SPCI). We specifically account for the nature that time series data are non-exchangeable, and thus many existing conformal prediction algorithms are not applicable. The main idea is to adaptively re-estimate the conditional quantile of non-conformity scores (e. g. , prediction residuals), upon exploiting the temporal dependence among them. More precisely, we cast the problem of conformal prediction interval as predicting the quantile of a future residual, given a user-specified point prediction algorithm. Theoretically, we establish asymptotic valid conditional coverage upon extending consistency analyses in quantile regression. Using simulation and real-data experiments, we demonstrate a significant reduction in interval width of SPCI compared to other existing methods under the desired empirical coverage.

AAAI Conference 2021 Short Paper

Change or Not: A Simple Approach for Plug and Play Language Models on Sentiment Control

  • Chen Xu
  • Jianyu Zhao
  • Rang Li
  • Changjian Hu
  • Chuangbai Xiao

Text generation with sentiment control is difficult without fine-tuning or modifying the model architecture. Plug and Play Language Model (PPLM) utilizes an external sentiment classifier to update the hidden states of GPT-2 at each time step. It does not change the parameters but achieves competitive performance. However, fluency is impaired due to the instability of the hidden states. Moreover, the classifier is not strong because of the way it is trained with partial texts, hence it is difficult to guide the generation in the process. To solve the above problems, in this paper, we first propose a fixed threshold method based on the Valence-Arousal-Dominance (VAD) lexicon to decide whether to change a word, which keeps the fluency of the original LM to the greatest extent. Furthermore, for the improvement of sentiment alignment, we propose a dynamic threshold method that utilizes VAD-based loss to make the threshold dynamic. Experiments demonstrate that our methods outperform the baseline with a great margin significantly both on fluency and sentiment accuracy.

ICML Conference 2021 Conference Paper

Conformal prediction interval for dynamic time-series

  • Chen Xu
  • Yao Xie 0002

We develop a method to construct distribution-free prediction intervals for dynamic time-series, called \Verb|EnbPI| that wraps around any bootstrap ensemble estimator to construct sequential prediction intervals. \Verb|EnbPI| is closely related to the conformal prediction (CP) framework but does not require data exchangeability. Theoretically, these intervals attain finite-sample, \textit{approximately valid} marginal coverage for broad classes of regression functions and time-series with strongly mixing stochastic errors. Computationally, \Verb|EnbPI| avoids overfitting and requires neither data-splitting nor training multiple ensemble estimators; it efficiently aggregates bootstrap estimators that have been trained. In general, \Verb|EnbPI| is easy to implement, scalable to producing arbitrarily many prediction intervals sequentially, and well-suited to a wide range of regression functions. We perform extensive real-data analyses to demonstrate its effectiveness.

JMLR Journal 2020 Journal Article

Distributed Feature Screening via Componentwise Debiasing

  • Xingxiang Li
  • Runze Li
  • Zhiming Xia
  • Chen Xu

Feature screening is a powerful tool in processing high-dimensional data. When the sample size N and the number of features p are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of 'divide-and-conquer', the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high accuracy that is insensitive to the number of data segments m. Under mild conditions, we show that the aggregated correlation estimator is as efficient as the centralized estimator in terms of the probability convergence bound and the mean squared error rate; the corresponding screening procedure enjoys sure screening property for a wide range of correlation measures. The promising performances of the new method are supported by extensive numerical examples. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

NeurIPS Conference 2020 Conference Paper

MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles

  • Zhennan Wang
  • Canqun Xiang
  • Wenbin Zou
  • Chen Xu

The strong correlation between neurons or filters can significantly weaken the generalization ability of neural networks. Inspired by the well-known Tammes problem, we propose a novel diversity regularization method to address this issue, which makes the normalized weight vectors of neurons or filters distributed on a hypersphere as uniformly as possible, through maximizing the minimal pairwise angles (MMA). This method can easily exert its effect by plugging the MMA regularization term into the loss function with negligible computational overhead. The MMA regularization is simple, efficient, and effective. Therefore, it can be used as a basic regularization method in neural network training. Extensive experiments demonstrate that MMA regularization is able to enhance the generalization ability of various modern models and achieves considerable performance improvements on CIFAR100 and TinyImageNet datasets. In addition, experiments on face verification show that MMA regularization is also effective for feature learning. Code is available at: https: //github. com/wznpub/MMA_Regularization.

NeurIPS Conference 2020 Conference Paper

Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery

  • Yingjie Wang
  • Hong Chen
  • Feng Zheng
  • Chen Xu
  • Tieliang Gong
  • Yanhong Chen

Additive models have attracted much attention for high-dimensional regression estimation and variable selection. However, the existing models are usually limited to the single-task learning framework under the mean squared error (MSE) criterion, where the utilization of variable structure depends heavily on priori knowledge among variables. For high-dimensional observations in real environment, e. g. , Coronal Mass Ejections (CMEs) data, the learning performance of previous methods may be degraded seriously due to the complex non-Gaussian noise and the insufficiency of prior knowledge on variable structure. To tackle this problem, we propose a new class of additive models, called Multi-task Additive Models (MAM), by integrating the mode-induced metric, the structure-based regularizer, and additive hypothesis spaces into a bilevel optimization framework. Our approach does not require any priori knowledge of variable structure and suits for high-dimensional data with complex noise, e. g. , skewed noise, heavy-tailed noise, and outliers. A smooth iterative optimization algorithm with convergence guarantees is provided to implement MAM efficiently. Experiments on simulations and the CMEs analysis demonstrate the competitive performance of our approach for robust estimation and automatic structure discovery.

NeurIPS Conference 2020 Conference Paper

Predictive inference is free with the jackknife+-after-bootstrap

  • Byol Kim
  • Chen Xu
  • Rina Barber

Ensemble learning is widely used in applications to make predictions in complex decision problems---for example, averaging models fitted to a sequence of samples bootstrapped from the available training data. While such methods offer more accurate, stable, and robust predictions and model estimates, much less is known about how to perform valid, assumption-lean inference on the output of these types of procedures. In this paper, we propose the jackknife+-after-bootstrap (J+aB), a procedure for constructing a predictive interval, which uses only the available bootstrapped samples and their corresponding fitted models, and is therefore "free" in terms of the cost of model fitting. The J+aB offers a predictive coverage guarantee that holds with no assumptions on the distribution of the data, the nature of the fitted model, or the way in which the ensemble of models are aggregated---at worst, the failure rate of the predictive interval is inflated by a factor of 2. Our numerical experiments verify the coverage and accuracy of the resulting predictive intervals on real data.

AAAI Conference 2020 Conference Paper

Robust Gradient-Based Markov Subsampling

  • Tieliang Gong
  • Quanhan Xi
  • Chen Xu

Subsampling is a widely used and effective method to deal with the challenges brought by big data. Most subsampling procedures are designed based on the importance sampling framework, where samples with high importance measures are given corresponding sampling probabilities. However, in the highly noisy case, these samples may cause an unstable estimator which could lead to a misleading result. To tackle this issue, we propose a gradient-based Markov subsampling (GMS) algorithm to achieve robust estimation. The core idea is to construct a subset which allows us to conservatively correct a crude initial estimate towards the true signal. Specifically, GMS selects samples with small gradients via a probabilistic procedure, constructing a subset that is likely to exclude noisy samples and provide a safe improvement over the initial estimate. We show that the GMS estimator is statistically consistent at a rate which matches the optimal in the minimax sense. The promising performance of GMS is supported by simulation studies and real data examples.

AAAI Conference 2017 Conference Paper

A Unified Convex Surrogate for the Schatten- p Norm

  • Chen Xu
  • Zhouchen Lin
  • Hongbin Zha

The Schatten-p norm (0 0 satisfying 1/p = 1/p1 + 1/p2, there is an equivalence between the Schatten-p norm of one matrix and the Schatten-p1 and the Schatten-p2 norms of its two factor matrices. We further extend the equivalence to multiple factor matrices and show that all the factor norms can be convex and smooth for any p > 0. In contrast, the original Schatten-p norm for 0 < p < 1 is non-convex and non-smooth. As an example we conduct experiments on matrix completion. To utilize the convexity of the factor matrix norms, we adopt the accelerated proximal alternating linearized minimization algorithm and establish its sequence convergence. Experiments on both synthetic and real datasets exhibit its superior performance over the state-of-the-art methods. Its speed is also highly competitive.

AAAI Conference 2016 Conference Paper

Relaxed Majorization-Minimization for Non-Smooth and Non-Convex Optimization

  • Chen Xu
  • Zhouchen Lin
  • Zhenyu Zhao
  • Hongbin Zha

We propose a new majorization-minimization (MM) method for non-smooth and non-convex programs, which is general enough to include the existing MM methods. Besides the local majorization condition, we only require that the difference between the directional derivatives of the objective function and its surrogate function vanishes when the number of iterations approaches infinity, which is a very weak condition. So our method can use a surrogate function that directly approximates the non-smooth objective function. In comparison, all the existing MM methods construct the surrogate function by approximating the smooth component of the objective function. We apply our relaxed MM methods to the robust matrix factorization (RMF) problem with different regularizations, where our locally majorant algorithm shows great advantages over the state-of-the-art approaches for RMF. This is the first algorithm for RMF ensuring, without extra assumptions, that any limit point of the iterates is a stationary point.