Arrow Research search

Author name cluster

Stephen Becker

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

ICML Conference 2025 Conference Paper

A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization

  • Nuojin Cheng
  • Leonard Papenmeier
  • Stephen Becker
  • Luigi Nardi

Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function’s optimum and are often considered fundamentally distinct from EI. In this work, we challenge this prevailing perspective by introducing a unified theoretical framework, Variational Entropy Search, which reveals that EI and information-theoretic acquisition functions are more closely related than previously recognized. We demonstrate that EI can be interpreted as a variational inference approximation of the popular information-theoretic acquisition function, named Max-value Entropy Search. Building on this insight, we propose VES-Gamma, a novel acquisition function that balances the strengths of EI and MES. Extensive empirical evaluations across both low- and high-dimensional synthetic and real-world benchmarks demonstrate that VES-Gamma is competitive with state-of-the-art acquisition functions and in many cases outperforms EI and MES.

UAI Conference 2025 Conference Paper

Exploring Exploration in Bayesian Optimization

  • Leonard Papenmeier
  • Nuojin Cheng
  • Stephen Becker
  • Luigi Nardi

A well-balanced exploration-exploitation trade-off is crucial for successful acquisition functions in Bayesian optimization. However, there is a lack of quantitative measures for exploration, making it difficult to analyze and compare different acquisition functions. This work introduces two novel approaches -observation traveling salesman distance and observation entropy- to quantify the exploration characteristics of acquisition functions based on their selected observations. Using these measures, we examine the explorative nature of several well-known acquisition functions across a diverse set of black-box problems, uncover links between exploration and empirical performance, and reveal new relationships among existing acquisition functions. Beyond enabling a deeper understanding of acquisition functions, these measures also provide a foundation for guiding their design in a more principled and systematic manner.

TMLR Journal 2025 Journal Article

Stochastic Subspace Descent Accelerated via Bi-fidelity Line Search

  • Nuojin Cheng
  • Alireza Doostan
  • Stephen Becker

Efficient optimization remains a fundamental challenge across numerous scientific and engineering domains, particularly when objective function evaluations are computationally expensive and gradient information is inaccessible. While zeroth-order optimization methods address the lack of gradients, their performance often suffers due to the high cost of repeated function queries. This work introduces a bi-fidelity line search scheme tailored for zeroth-order optimization. Our method constructs a temporary surrogate model by strategically combining inexpensive low-fidelity (LF) evaluations with accurate high-fidelity (HF) evaluations of the objective function. This surrogate enables an efficient backtracking line search for step size selection, significantly reducing the required number of costly HF queries. We provide theoretical convergence guarantees for this scheme under standard assumptions. Furthermore, we integrate this bi-fidelity strategy into the stochastic subspace descent framework, proposing the bi-fidelity stochastic subspace descent (BF-SSD) algorithm. A comprehensive empirical evaluation of BF-SSD is conducted across four distinct problems: a synthetic optimization benchmark, dual-form kernel ridge regression, black-box adversarial attacks on machine learning models, and transformer-based black-box language model fine-tuning. The numerical results consistently demonstrate that BF-SSD achieves superior optimization performance, particularly in terms of solution quality obtained per HF function evaluation, when compared against relevant baseline methods. This study highlights the efficacy of integrating bi-fidelity strategies within zeroth-order optimization frameworks, positioning BF-SSD as a promising and computationally efficient approach for tackling large-scale, high-dimensional optimization problems encountered in various real-world applications.

JMLR Journal 2024 Journal Article

High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

  • Liam Madden
  • Emiliano Dall'Anese
  • Stephen Becker

Stochastic gradient descent is one of the most common iterative algorithms used in machine learning and its convergence analysis is a rich area of research. Understanding its convergence properties can help inform what modifications of it to use in different settings. However, most theoretical results either assume convexity or only provide convergence results in mean. This paper, on the other hand, proves convergence bounds in high probability without assuming convexity. Assuming strong smoothness, we prove high probability convergence bounds in two settings: (1) assuming the Polyak-Łojasiewicz inequality and norm sub-Gaussian gradient noise and (2) assuming norm sub-Weibull gradient noise. In the second setting, as an intermediate step to proving convergence, we prove a sub-Weibull martingale difference sequence self-normalized concentration inequality of independent interest. It extends Freedman-type concentration beyond the sub-exponential threshold to heavier-tailed martingale difference sequences. We also provide a post-processing method that picks a single iterate with a provable convergence guarantee as opposed to the usual bound for the unknown best iterate. Our convergence result for sub-Weibull noise extends the regime where stochastic gradient descent has equal or better convergence guarantees than stochastic gradient descent with modifications such as clipping, momentum, and normalization. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

ICML Conference 2021 Conference Paper

A Sampling-Based Method for Tensor Ring Decomposition

  • Osman Asif Malik
  • Stephen Becker

We propose a sampling-based method for computing the tensor ring (TR) decomposition of a data tensor. The method uses leverage score sampled alternating least squares to fit the TR cores in an iterative fashion. By taking advantage of the special structure of TR tensors, we can efficiently estimate the leverage scores and attain a method which has complexity sublinear in the number of input tensor entries. We provide high-probability relative-error guarantees for the sampled least squares problems. We compare our proposal to existing methods in experiments on both synthetic and real data. Our method achieves substantial speedup—sometimes two or three orders of magnitude—over competing methods, while maintaining good accuracy. We also provide an example of how our method can be used for rapid feature extraction.

NeurIPS Conference 2018 Conference Paper

Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch

  • Osman Asif Malik
  • Stephen Becker

We propose two randomized algorithms for low-rank Tucker decomposition of tensors. The algorithms, which incorporate sketching, only require a single pass of the input tensor and can handle tensors whose elements are streamed in any order. To the best of our knowledge, ours are the only algorithms which can do this. We test our algorithms on sparse synthetic data and compare them to multiple other methods. We also apply one of our algorithms to a real dense 38 GB tensor representing a video and use the resulting decomposition to correctly classify frames containing disturbances.

AAAI Conference 2018 Conference Paper

Randomized Clustered Nystrom for Large-Scale Kernel Machines

  • Farhad Pourkamali-Anaraki
  • Stephen Becker
  • Michael Wakin

The Nyström method is a popular technique for generating low-rank approximations of kernel matrices that arise in many machine learning problems. The approximation quality of the Nyström method depends crucially on the number of selected landmark points and the selection procedure. In this paper, we introduce a randomized algorithm for generating landmark points that is scalable to large high-dimensional data sets. The proposed method performs K-means clustering on low-dimensional random projections of a data set and thus leads to significant savings for high-dimensional data sets. Our theoretical results characterize the tradeoffs between accuracy and efficiency of the proposed method. Moreover, numerical experiments on classification and regression tasks demonstrate the superior performance and efficiency of our proposed method compared with existing approaches.

AAAI Conference 2017 Conference Paper

Robust Partially-Compressed Least-Squares

  • Stephen Becker
  • Ban Kawas
  • Marek Petrik

Randomized matrix compression techniques, such as the Johnson-Lindenstrauss transform, have emerged as an effective and practical way for solving large-scale problems efficiently. With a focus on computational efficiency, however, forsaking solutions quality and accuracy becomes the tradeoff. In this paper, we investigate compressed least-squares problems and propose new models and algorithms that address the issue of error and noise introduced by compression. While maintaining computational efficiency, our models provide robust solutions that are more accurate than those of classical compressed variants. We introduce tools from robust optimization together with a form of partial compression to improve the error-time trade-offs of compressed least-squares solvers. We develop an efficient solution algorithm for our Robust Partially-Compressed (RPC) model based on a reduction to a one-dimensional search.

UAI Conference 2014 Conference Paper

A variational approach to stable principal component pursuit

  • Aleksandr Y. Aravkin
  • Stephen Becker
  • Volkan Cevher
  • Peder A. Olsen

We introduce a new convex formulation for stable principal component pursuit (SPCP) to decompose noisy signals into low-rank and sparse representations. For numerical solutions of our SPCP formulation, we first develop a convex variational framework and then accelerate it with quasi-Newton methods. We show, via synthetic and real data experiments, that our approach offers advantages over the classical SPCP formulations in scalability and practical parameter selection.

NeurIPS Conference 2014 Conference Paper

QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models

  • Cho-Jui Hsieh
  • Inderjit Dhillon
  • Pradeep Ravikumar
  • Stephen Becker
  • Peder Olsen

In this paper, we develop a family of algorithms for optimizing superposition-structured” or “dirty” statistical estimators for high-dimensional problems involving the minimization of the sum of a smooth loss function with a hybrid regularization. Most of the current approaches are first-order methods, including proximal gradient or Alternating Direction Method of Multipliers (ADMM). We propose a new family of second-order methods where we approximate the loss function using quadratic approximation. The superposition structured regularizer then leads to a subproblem that can be efficiently solved by alternating minimization. We propose a general active subspace selection approach to speed up the solver by utilizing the low-dimensional structure given by the regularizers, and provide convergence guarantees for our algorithm. Empirically, we show that our approach is more than 10 times faster than state-of-the-art first-order approaches for the latent variable graphical model selection problems and multi-task learning problems when there is more than one regularizer. For these problems, our approach appears to be the first algorithm that can extend active subspace ideas to multiple regularizers. "

NeurIPS Conference 2014 Conference Paper

Time--Data Tradeoffs by Aggressive Smoothing

  • John Bruer
  • Joel Tropp
  • Volkan Cevher
  • Stephen Becker

This paper proposes a tradeoff between sample complexity and computation time that applies to statistical estimators based on convex optimization. As the amount of data increases, we can smooth optimization problems more and more aggressively to achieve accurate estimates more quickly. This work provides theoretical and experimental evidence of this tradeoff for a class of regularized linear inverse problems.

ICML Conference 2013 Conference Paper

Sparse projections onto the simplex

  • Anastasios Kyrillidis
  • Stephen Becker
  • Volkan Cevher
  • Christoph Koch 0001

Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the \ell_1-norm. However, several important learning applications cannot benefit from this approach as they feature these convex norms as constraints in addition to the non-convex rank and sparsity constraints. In this setting, we derive efficient sparse projections onto the simplex and its extension, and illustrate how to use them to solve high-dimensional learning problems in quantum tomography, sparse density estimation and portfolio selection with non-convex constraints.

NeurIPS Conference 2012 Conference Paper

A quasi-Newton proximal splitting method

  • Stephen Becker
  • Jalal Fadili

We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse regression and recovery, and machine learning and classification.