Author name cluster

Rahul Mazumder

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs

Mehdi Makni
Xiang Meng
Rahul Mazumder

Sparse plus Low-Rank $(\mathbf{S} + \mathbf{L}\mathbf{R})$ decomposition of Large Language Models (LLMs) has emerged as a promising direction in $\textit{model compression}$, aiming to decompose pre-trained model weights into a sum of sparse and low-rank matrices $\mathbf{W} \approx \mathbf{S} + \mathbf{LR}$. Despite recent progress, existing methods often suffer from substantial performance degradation compared to dense models. In this work, we introduce $\texttt{3BASiL-TM}$, an efficient one-shot post-training method for $(\mathbf{S} + \mathbf{L}\mathbf{R})$ decomposition of LLMs that addresses this gap. Our approach first introduces a novel 3-Block Alternating Direction Method of Multipliers (ADMM) method, termed $\texttt{3BASiL}$, to minimize the layer-wise reconstruction error with convergence guarantees. We then design a transformer-matching ($\texttt{TM}$) refinement step that jointly optimizes the sparse and low-rank components across transformer layers. This step minimizes a novel memory-efficient loss that aligns outputs at the transformer level. Notably, the $\texttt{TM}$ procedure is universal as it can enhance any $(\mathbf{S} + \mathbf{L}\mathbf{R})$ decomposition, including pure sparsity. Our numerical experiments show that $\texttt{3BASiL-TM}$ reduces the WikiText2 perplexity gap to dense LLaMA-8B model by over 30% under a (2: 4 Sparse + 64 LR) configuration, compared to prior methods. Moreover, our method achieves over 2. 5x faster compression runtime on an A100 GPU compared to SOTA $(\mathbf{S} + \mathbf{L}\mathbf{R})$ method. Our code is available at https: //github. com/mazumder-lab/3BASiL.

NeurIPS Conference 2025 Conference Paper

Differentially Private High-dimensional Variable Selection via Integer Programming

Petros Prastakos
Kayhan Behdin
Rahul Mazumder

Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale non-private sparse regression—known as Best Subset Selection (BSS)—with millions of variables in minutes. However, extending these algorithmic advances to the setting of Differential Privacy (DP) has remained largely unexplored. In this paper, we introduce two new differentially private estimators for sparse variable selection, levering modern MIP techniques. Our framework is general and applies broadly to problems like sparse regression or classification, and we provide theoretical support recovery guarantees in the case of BSS. Inspired by the exponential mechanism, we develop structured sampling procedures that efficiently explore the non-convex objective landscape, avoiding the exhaustive combinatorial search in the exponential mechanism. We complement our theoretical findings with extensive numerical experiments, using both least squares and hinge loss for our objective function, and demonstrate that our methods achieve state-of-the-art empirical support recovery, outperforming competing algorithms in settings with up to $p=10^4$.

ICLR Conference 2025 Conference Paper

Preserving Deep Representations in One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework

Ryan Lucas
Rahul Mazumder

We present SNOWS, a one-shot post-training pruning framework aimed at reducing the cost of vision network inference without retraining. Current leading one-shot pruning methods minimize layer-wise least squares reconstruction error which does not take into account deeper network representations. We propose to optimize a more global reconstruction objective. This objective accounts for nonlinear activations deep in the network to obtain a better proxy for the network loss. This nonlinear objective leads to a more challenging optimization problem---we demonstrate it can be solved efficiently using a specialized second-order optimization framework. A key innovation of our framework is the use of Hessian-free optimization to compute exact Newton descent steps without needing to compute or store the full Hessian matrix. A distinct advantage of SNOWS is that it can be readily applied on top of any sparse mask derived from prior methods, readjusting their weights to preserve deep feature representations. SNOWS obtains state-of-the-art results on various one-shot pruning benchmarks including residual networks and Vision Transformers (ViT/B-16 and ViT/L-16, 86m and 304m parameters respectively). Our open-source implementation is available at https://github.com/mazumder-lab/SNOWS.

JMLR Journal 2025 Journal Article

Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests

Brian Liu
Rahul Mazumder

We study the often overlooked phenomenon, first noted in Breiman (2001), that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by Mentch and Zhou (2020), where the authors explain the success of random forests in low signal-to-noise ratio (SNR) settings through regularization, we explore how random forests can capture patterns in the data that bagging ensembles fail to capture. We empirically demonstrate that in the presence of such patterns, random forests reduce bias along with variance and can increasingly outperform bagging ensembles when SNR is high. Our observations offer insights into the real-world success of random forests across a range of SNRs and enhance our understanding of the difference between random forests and bagging ensembles. Our investigations also yield practical insights into the importance of tuning $mtry$ in random forests. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

NeurIPS Conference 2025 Conference Paper

TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks

Xiang Meng
Mehdi Makni
Rahul Mazumder

Network pruning reduces computational requirements of large neural networks, with N: M sparsity—retaining only N out of every M consecutive weights—offering a compelling balance between compressed model quality and hardware acceleration. However, N: M sparsity only accelerates forward-pass computations, as N: M patterns are not preserved during matrix transposition, limiting efficiency during training where both passes are computationally intensive. While transposable N: M sparsity has been proposed to address this limitation, existing methods for finding transposable N: M sparse masks either fail to scale to large models or are restricted to M=4 which results in suboptimal compression-accuracy trade-off. We introduce an efficient solver for transposable N: M masks that scales to billion-parameter models. We formulate mask generation as optimal transport problems and solve through entropy regularization and Dykstra's algorithm, followed by a rounding procedure. Our tensor-based implementation exploits GPU parallelism, achieving up to 100× speedup with only 1-10\% error compared to existing methods. Our approach can be integrated with layer-wise N: M pruning frameworks including Wanda, SparseGPT and ALPS to produce transposable N: M sparse models with arbitrary N: M values. Experiments show that LLaMA3. 2-8B with transposable 16: 32 sparsity maintains performance close to its standard N: M counterpart and outperforms standard 2: 4 sparse model, showing the practical value of our approach. Our code is available at https: //github. com/mazumder-lab/TSENOR.

NeurIPS Conference 2024 Conference Paper

ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

Xiang Meng
Kayhan Behdin
Haoyue Wang
Rahul Mazumder

The impressive performance of Large Language Models (LLMs) across various natural language processing tasks comes at the cost of vast computational resources and storage requirements. One-shot pruning techniques offer a way to alleviate these burdens by removing redundant weights without the need for retraining. Yet, the massive scale of LLMs often forces current pruning approaches to rely on heuristics instead of optimization-based techniques, potentially resulting in suboptimal compression. In this paper, we introduce ALPS, an optimization-based framework that tackles the pruning problem using the operator splitting technique and a preconditioned conjugate gradient-based post-processing step. Our approach incorporates novel techniques to accelerate and theoretically guarantee convergence while leveraging vectorization and GPU parallelism for efficiency. ALPS substantially outperforms state-of-the-art methods in terms of the pruning objective and perplexity reduction, particularly for highly sparse models. On the LLaMA3-8B model with 70\% sparsity, ALPS achieves a 29\% reduction in test perplexity on the WikiText dataset and a 8\% improvement in zero-shot benchmark performance compared to existing methods. Our code is available at https: //github. com/mazumder-lab/ALPS.

PDF Details DOI

ICML Conference 2024 Conference Paper

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

Xiang Meng 0004
Shibal Ibrahim
Kayhan Behdin
Hussein Hazimeh 0001
Natalia Ponomareva 0001
Rahul Mazumder

Structured pruning is a promising approach for reducing the inference costs of large vision and language models. By removing carefully chosen structures, e. g. , neurons or attention heads, the improvements from this approach can be realized on standard deep learning hardware. In this work, we focus on structured pruning in the one-shot (post-training) setting, which does not require model retraining after pruning. We propose a novel combinatorial optimization framework for this problem, based on a layer-wise reconstruction objective and a careful reformulation that allows for scalable optimization. Moreover, we design a new local combinatorial optimization algorithm, which exploits low-rank updates for efficient local search. Our framework is time and memory-efficient and considerably improves upon state-of-the-art one-shot methods on vision models (e. g. , ResNet50, MobileNet) and language models (e. g. , OPT-1. 3B – OPT-30B). For language models, e. g. , OPT-2. 7B, OSSCAR can lead to $125\times$ lower test perplexity on WikiText with $2\times$ inference time speedup in comparison to the state-of-the-art ZipLM approach. Our framework is also $6\times$ – $8\times$ faster. Notably, our work considers models with tens of billions of parameters, which is up to $100\times$ larger than what has been previously considered in the structured pruning literature. Our code is available at https: //github. com/mazumder-lab/OSSCAR.

JMLR Journal 2024 Journal Article

Sparse NMF with Archetypal Regularization: Computational and Robustness Properties

Kayhan Behdin
Rahul Mazumder

We consider the problem of sparse nonnegative matrix factorization (NMF) using archetypal regularization. The goal is to represent a collection of data points as nonnegative linear combinations of a few nonnegative sparse factors with appealing geometric properties, arising from the use of archetypal regularization. We generalize the notion of robustness studied in Javadi and Montanari (2019) (without sparsity) to the notions of (a) strong robustness that implies each estimated archetype is close to the underlying archetypes and (b) weak robustness that implies there exists at least one recovered archetype that is close to the underlying archetypes. Our theoretical results on robustness guarantees hold under minimal assumptions on the underlying data, and applies to settings where the underlying archetypes need not be sparse. We present theoretical results and illustrative examples to strengthen the insights underlying the notions of robustness. We propose new algorithms for our optimization problem; and present numerical experiments on synthetic and real data sets that shed further insights into our proposed framework and theoretical developments. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

ICML Conference 2023 Conference Paper

Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

Riade Benbaki
Wenyu Chen 0003
Xiang Meng 0004
Hussein Hazimeh 0001
Natalia Ponomareva 0001
Zhe Zhao 0001
Rahul Mazumder

The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning. CHITA’s main workhorse performs combinatorial optimization updates on a memory-friendly representation of local quadratic approximation(s) of the loss function. On a standard benchmark of pretrained models and datasets, CHITA leads to superior sparsity-accuracy tradeoffs than competing methods. For example, for MLPNet with only 2% of the weights retained, our approach improves the accuracy by 63% relative to the state of the art. Furthermore, when used in conjunction with fine-tuning SGD steps, our method achieves significant accuracy gains over state-of-the-art approaches. Our code is publicly available at: https: //github. com/mazumder-lab/CHITA.

NeurIPS Conference 2023 Conference Paper

GRAND-SLAMIN’ Interpretable Additive Modeling with Structural Constraints

Shibal Ibrahim
Gabriel Afriat
Kayhan Behdin
Rahul Mazumder

Generalized Additive Models (GAMs) are a family of flexible and interpretable models with old roots in statistics. GAMs are often used with pairwise interactions to improve model accuracy while still retaining flexibility and interpretability but lead to computational challenges as we are dealing with order of $p^2$ terms. It is desirable to restrict the number of components (i. e. , encourage sparsity) for easier interpretability, and better computational and statistical properties. Earlier approaches, considering sparse pairwise interactions, have limited scalability, especially when imposing additional structural interpretability constraints. We propose a flexible GRAND-SLAMIN framework that can learn GAMs with interactions under sparsity and additional structural constraints in a differentiable end-to-end fashion. We customize first-order gradient-based optimization to perform sparse backpropagation to exploit sparsity in additive effects for any differentiable loss function in a GPU-compatible manner. Additionally, we establish novel non-asymptotic prediction bounds for our estimators with tree-based shape functions. Numerical experiments on real-world datasets show that our toolkit performs favorably in terms of performance, variable selection and scalability when compared with popular toolkits to fit GAMs with interactions. Our work expands the landscape of interpretable modeling while maintaining prediction accuracy competitive with non-interpretable black-box models. Our code is available at https: //github. com/mazumder-lab/grandslamin.

JMLR Journal 2023 Journal Article

L0Learn: A Scalable Package for Sparse Learning using L0 Regularization

Hussein Hazimeh
Rahul Mazumder
Tim Nonet

We present L0Learn: an open-source package for sparse linear regression and classification using $\ell_0$ regularization. L0Learn implements scalable, approximate algorithms, based on coordinate descent and local combinatorial optimization. The package is built using C++ and has user-friendly R and Python interfaces. L0Learn can address problems with millions of features, achieving competitive run times and statistical performance with state-of-the-art sparse learning packages. L0Learn is available on both CRAN and GitHub. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

On the Convergence of CART under Sufficient Impurity Decrease Condition

Rahul Mazumder
Haoyue Wang

The decision tree is a flexible machine-learning model that finds its success in numerous applications. It is usually fitted in a recursively greedy manner using CART. In this paper, we study the convergence rate of CART under a regression setting. First, we prove an upper bound on the prediction error of CART under a sufficient impurity decrease (SID) condition \cite{chi2020asymptotic} -- our result is an improvement over the known result by \cite{chi2020asymptotic} under a similar assumption. We show via examples that this error bound cannot be further improved by more than a constant or a log factor. Second, we introduce a few easy-to-check sufficient conditions of the SID condition. In particular, we show that the SID condition can be satisfied by an additive model when the component functions satisfy a ``locally reverse Poincare inequality". We discuss a few familiar function classes in non-parametric estimation to demonstrate the usefulness of this conception.

NeurIPS Conference 2022 Conference Paper

Pushing the limits of fairness impossibility: Who's the fairest of them all?

Brian Hsu
Rahul Mazumder
Preetam Nandy
Kinjal Basu

The impossibility theorem of fairness is a foundational result in the algorithmic fairness literature. It states that outside of special cases, one cannot exactly and simultaneously satisfy all three common and intuitive definitions of fairness - demographic parity, equalized odds, and predictive rate parity. This result has driven most works to focus on solutions for one or two of the metrics. Rather than follow suit, in this paper we present a framework that pushes the limits of the impossibility theorem in order to satisfy all three metrics to the best extent possible. We develop an integer-programming based approach that can yield a certifiably optimal post-processing method for simultaneously satisfying multiple fairness criteria under small violations. We show experiments demonstrating that our post-processor can improve fairness across the different definitions simultaneously with minimal model performance reduction. We also discuss applications of our framework for model selection and fairness explainability, thereby attempting to answer the question: Who's the fairest of them all?

ICML Conference 2022 Conference Paper

Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features

Rahul Mazumder
Xiang Meng 0004
Haoyue Wang

Decision trees are one of the most useful and popular methods in the machine learning toolbox. In this paper, we consider the problem of learning optimal decision trees, a combinatorial optimization problem that is challenging to solve at scale. A common approach in the literature is to use greedy heuristics, which may not be optimal. Recently there has been significant interest in learning optimal decision trees using various approaches (e. g. , based on integer programming, dynamic programming)—to achieve computational scalability, most of these approaches focus on classification tasks with binary features. In this paper, we present a new discrete optimization method based on branch-and-bound (BnB) to obtain optimal decision trees. Different from existing customized approaches, we consider both regression and classification tasks with continuous features. The basic idea underlying our approach is to split the search space based on the quantiles of the feature distribution—leading to upper and lower bounds for the underlying optimization problem along the BnB iterations. Our proposed algorithm Quant-BnB shows significant speedups compared to existing approaches for shallow optimal trees on various real datasets.

JMLR Journal 2022 Journal Article

Solving L1-regularized SVMs and Related Linear Programs: Revisiting the Effectiveness of Column and Constraint Generation

Antoine Dedieu
Rahul Mazumder
Haoyue Wang

The linear Support Vector Machine (SVM) is a classic classification technique in machine learning. Motivated by applications in high dimensional statistics, we consider penalized SVM problems involving the minimization of a hinge-loss function with a convex sparsity-inducing regularizer such as: the L1-norm on the coefficients, its grouped generalization and the sorted L1-penalty (aka Slope). Each problem can be expressed as a Linear Program (LP) and is computationally challenging when the number of features and/or samples is large---the current state of algorithms for these problems is rather nascent when compared to the usual L2-regularized linear SVM. To this end, we propose new computational algorithms for these LPs by bringing together techniques from (a) classical column (and constraint) generation methods and (b) first-order methods for non-smooth convex optimization---techniques that appear to be rarely used together for solving large scale LPs. These components have their respective strengths; and while they are found to be useful as separate entities, they appear to be more powerful in practice when used together in the context of solving large-scale LPs such as the ones studied herein. Our approach complements the strengths of (a) and (b)---leading to a scheme that seems to significantly outperform commercial solvers as well as specialized implementations for these problems. We present numerical results on a series of real and synthetic data sets demonstrating the surprising effectiveness of classic column/constraint generation methods in the context of challenging LP-based machine learning tasks. [abs] [ pdf ][ bib ] &copy JMLR 2022. ( edit, beta )

NeurIPS Conference 2021 Conference Paper

DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

Hussein Hazimeh
Zhe Zhao
Aakanksha Chowdhery
Maheswaran Sathiamoorthy
Yihua Chen
Rahul Mazumder
Lichan Hong
Ed Chi

The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable "sparse gate'" to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness can lead to convergence and statistical performance issues when training with gradient-based methods. In this paper, we develop DSelect-k: a continuously differentiable and sparse gate for MoE, based on a novel binary encoding formulation. The gate can be trained using first-order methods, such as stochastic gradient descent, and offers explicit control over the number of experts to select. We demonstrate the effectiveness of DSelect-k on both synthetic and real MTL datasets with up to 128 tasks. Our experiments indicate that DSelect-k can achieve statistically significant improvements in prediction and expert selection over popular MoE gates. Notably, on a real-world, large-scale recommender system, DSelect-k achieves over 22% improvement in predictive performance compared to Top-k. We provide an open-source implementation of DSelect-k.

JMLR Journal 2021 Journal Article

Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

Antoine Dedieu
Hussein Hazimeh
Rahul Mazumder

We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized regression problems at scales much larger than what was conventionally considered possible. Despite their usefulness, MIP-based global optimization approaches are significantly slower than the relatively mature algorithms for $\ell_1$-regularization and heuristics for nonconvex regularized problems. We aim to bridge this gap in computation times by developing new MIP-based algorithms for $\ell_0$-regularized classification. We propose two classes of scalable algorithms: an exact algorithm that can handle $p\approx 50,000$ features in a few minutes, and approximate algorithms that can address instances with $p\approx 10^6$ in times comparable to the fast $\ell_1$-based algorithms. Our exact algorithm is based on the novel idea of \textsl{integrality generation}, which solves the original problem (with $p$ binary variables) via a sequence of mixed integer programs that involve a small number of binary variables. Our approximate algorithms are based on coordinate descent and local combinatorial search. In addition, we present new estimation error bounds for a class of $\ell_0$-regularized estimators. Experiments on real and synthetic data demonstrate that our approach leads to models with considerably improved statistical performance (especially variable selection) compared to competing methods. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

ICML Conference 2020 Conference Paper

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications

Kinjal Basu 0001
Amol Ghoting
Rahul Mazumder
Yao Pan

Key problems arising in web applications (with millions of users and thousands of items) can be formulated as linear programs involving billions to trillions of decision variables and constraints. Despite the appeal of linear program (LP) formulations, solving problems at these scales appear to be well beyond the capabilities of existing LP solvers. Often ad-hoc decomposition rules are used to approximately solve these LPs, which have limited optimality guarantees and may lead to sub-optimal performance in practice. In this work, we propose a distributed solver that solves a perturbation of the LP problems at scale via a gradient-based algorithm on the smooth dual of the perturbed LP. The main workhorses of our algorithm are distributed matrix-vector multiplications (with load balancing) and efficient projection operations on distributed machines. Experiments on real-world data show that our proposed LP solver, ECLIPSE, can solve problems with $10^{12}$ decision variables – well beyond the capabilities of current solvers.

ICML Conference 2020 Conference Paper

The Tree Ensemble Layer: Differentiability meets Conditional Computation

Hussein Hazimeh 0001
Natalia Ponomareva 0001
Petros Mol
Zhenyu Tan
Rahul Mazumder

Neural networks and tree ensembles are state-of-the-art learners, each with its unique statistical and computational advantages. We aim to combine these advantages by introducing a new layer for neural networks, composed of an ensemble of differentiable decision trees (a. k. a. soft trees). While differentiable trees demonstrate promising results in the literature, they are typically slow in training and inference as they do not support conditional computation. We mitigate this issue by introducing a new sparse activation function for sample routing, and implement true conditional computation by developing specialized forward and backward propagation algorithms that exploit sparsity. Our efficient algorithms pave the way for jointly training over deep and wide tree ensembles using first-order methods (e. g. , SGD). Experiments on 23 classification datasets indicate over 10x speed-ups compared to the differentiable trees used in the literature and over 20x reduction in the number of parameters compared to gradient boosted trees, while maintaining competitive performance. Moreover, experiments on CIFAR, MNIST, and Fashion MNIST indicate that replacing dense layers in CNNs with our tree layer reduces the test loss by 7-53% and the number of parameters by 8x. We provide an open-source TensorFlow implementation with a Keras API.

JMLR Journal 2017 Journal Article

Certifiably Optimal Low Rank Factor Analysis

Dimitris Bertsimas
Martin S. Copenhaver
Rahul Mazumder

Factor Analysis (FA) is a technique of fundamental importance that is widely used in classical and modern multivariate statistics, psychometrics, and econometrics. In this paper, we revisit the classical rank-constrained FA problem which seeks to approximate an observed covariance matrix ($\B\Sigma$) by the sum of a Positive Semidefinite (PSD) low-rank component ($\B\Theta$) and a diagonal matrix ($\B\Phi$) (with nonnegative entries) subject to $\B\Sigma - \B\Phi$ being PSD. We propose a flexible family of rank-constrained, nonlinear Semidefinite Optimization based formulations for this task. We introduce a reformulation of the problem as a smooth optimization problem with convex, compact constraints and propose a unified algorithmic framework, utilizing state of the art techniques in nonlinear optimization to obtain high-quality feasible solutions for our proposed formulation. At the same time, by using a variety of techniques from discrete and global optimization, we show that these solutions are certifiably optimal in many cases, even for problems with thousands of variables. Our techniques are general and make no assumption on the underlying problem data. The estimator proposed herein aids statistical interpretability and provides computational scalability and significantly improved accuracy when compared to current, publicly available popular methods for rank-constrained FA. We demonstrate the effectiveness of our proposal on an array of synthetic and real-life datasets. To our knowledge, this is the first paper that demonstrates how a previously intractable rank-constrained optimization problem can be solved to provable optimality by coupling developments in convex analysis and in global and discrete optimization. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

JMLR Journal 2015 Journal Article

Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares

Trevor Hastie
Rahul Mazumder
Jason D. Lee
Reza Zadeh

The matrix-completion problem has attracted a lot of attention, largely as a result of the celebrated Netflix competition. Two popular approaches for solving the problem are nuclear-norm- regularized matrix approximation (Candes and Tao, 2009; Mazumder et al., 2010), and maximum-margin matrix factorization (Srebro et al., 2005). These two procedures are in some cases solving equivalent problems, but with quite different algorithms. In this article we bring the two approaches together, leading to an efficient algorithm for large matrix factorization and completion that outperforms both of these. We develop a software package softImpute in R for implementing our approaches, and a distributed version for very large matrices using the Spark cluster programming environment [abs] [ pdf ][ bib ] &copy JMLR 2015. ( edit, beta )

JMLR Journal 2012 Journal Article

Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Rahul Mazumder
Trevor Hastie

We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

JMLR Journal 2010 Journal Article

Spectral Regularization Algorithms for Learning Large Incomplete Matrices

Rahul Mazumder
Trevor Hastie
Robert Tibshirani

We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm SOFT-IMPUTE iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity of order linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices; for example SOFT-IMPUTE takes a few hours to compute low-rank approximations of a 10 6 X 10 6 incomplete matrix with 10 7 observed entries, and fits a rank- 95 approximation to the full Netflix training set in 3.3 hours. Our methods achieve good training and test errors and exhibit superior timings when compared to other competitive state-of-the-art techniques. [abs] [ pdf ][ bib ] &copy JMLR 2010. ( edit, beta )