Author name cluster

Boxiang Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

QuanDA: Quantile-Based Discriminant Analysis for High-Dimensional Imbalanced Classification

Qian Tang
Yuwen Gu
Boxiang Wang

Binary classification with imbalanced classes is a common and fundamental task, where standard machine learning methods often struggle to provide reliable predictive performance. Although numerous approaches have been proposed to address this issue, classification in low-sample-size and high-dimensional settings still remains particularly challenging. The abundance of noisy features in high-dimensional data limits the effectiveness of classical methods due to overfitting, and the minority class is even difficult to detect because of its severe underrepresentation with low sample size. To address this challenge, we introduce Quantile-based Discriminant Analysis (QuanDA), which builds upon a novel connection with quantile regression and naturally accounts for class imbalance through appropriately chosen quantile levels. We provide comprehensive theoretical analysis to validate QuanDA in ultra-high dimensional settings. Through extensive simulation studies and high-dimensional benchmark data analysis, we demonstrate that QuanDA overall outperforms existing classification methods for imbalanced data, including cost-sensitive large-margin classifiers, random forests, and SMOTE.

PDF Details

ICML Conference 2024 Conference Paper

Finite Smoothing Algorithm for High-Dimensional Support Vector Machines and Quantile Regression

Qian Tang
Yikai Zhang 0007
Boxiang Wang

This paper introduces a finite smoothing algorithm (FSA), a novel approach to tackle computational challenges in applying support vector machines (SVM) and quantile regression to high-dimensional data. The critical issue with these methods is the non-smooth nature of their loss functions, which traditionally limits the use of highly efficient coordinate descent techniques in high-dimensional settings. FSA innovatively addresses this issue by transforming these loss functions into their smooth counterparts, thereby facilitating more efficient computation. A distinctive feature of FSA is its theoretical foundation: FSA can yield exact solutions, not just approximations, despite the smoothing approach. Our simulation and benchmark tests demonstrate that FSA significantly outpaces its competitors in speed, often by orders of magnitude, while improving or at least maintaining precision. We have implemented FSA in two open-source R packages: hdsvm for high-dimensional SVM and hdqr for high-dimensional quantile regression.

Details

NeurIPS Conference 2022 Conference Paper

A Consolidated Cross-Validation Algorithm for Support Vector Machines via Data Reduction

Boxiang Wang
Archer Yang

We propose a consolidated cross-validation (CV) algorithm for training and tuning the support vector machines (SVM) on reproducing kernel Hilbert spaces. Our consolidated CV algorithm utilizes a recently proposed exact leave-one-out formula for the SVM and accelerates the SVM computation via a data reduction strategy. In addition, to compute the SVM with the bias term (intercept), which is not handled by the existing data reduction methods, we propose a novel two-stage consolidated CV algorithm. With numerical studies, we demonstrate that our algorithm is about an order of magnitude faster than the two mainstream SVM solvers, kernlab and LIBSVM, with almost the same accuracy.

PDF Details

JMLR Journal 2021 Journal Article

Sparse Tensor Additive Regression

Botao Hao
Boxiang Wang
Pengyuan Wang
Jingfei Zhang
Jian Yang
Will Wei Sun

Tensors are becoming prevalent in modern applications such as medical imaging and digital marketing. In this paper, we propose a sparse tensor additive regression (STAR) that models a scalar response as a flexible nonparametric function of tensor covariates. The proposed model effectively exploits the sparse and low-rank structures in the tensor additive regression. We formulate the parameter estimation as a non-convex optimization problem, and propose an efficient penalized alternating minimization algorithm. We establish a non-asymptotic error bound for the estimator obtained from each iteration of the proposed algorithm, which reveals an interplay between the optimization error and the statistical rate of convergence. We demonstrate the efficacy of STAR through extensive comparative simulation studies, and an application to the click-through-rate prediction in online advertising. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

PDF Details