SODA Conference 2025 Conference Paper
Competitive strategies to use "warm start" algorithms with predictions
- Avrim Blum
- Vaidehi Srinivas
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
SODA Conference 2025 Conference Paper
NeurIPS Conference 2025 Conference Paper
Tensor decomposition is a canonical non-convex optimization problem that is computationally challenging, and yet important due to applications in factor analysis and parameter estimation of latent variable models. In practice, scalable iterative methods, particularly Alternating Least Squares (ALS), remain the workhorse for tensor decomposition despite the lack of global convergence guarantees. A popular approach to tackle challenging non-convex optimization problems is overparameterization--- on input an $n \times n \times n$ tensor of rank $r$, the algorithm can output a decomposition of potentially rank $k$ (potentially larger than $r$). On the theoretical side, overparameterization for iterative methods is challenging to reason about and requires new techniques. The work of Wang et al. , (NeurIPS 2020) makes progress by showing that a variant of gradient descent globally converges when overparameterized to $k=O(r^{7. 5} \log n)$. Our main result shows that overparameterization provably enables global convergence of ALS: on input a third order $n \times n \times n$ tensor with a decomposition of rank $r \ll n$, ALS overparameterized with rank $k=O(r^2)$ achieves global convergence with high probability under random initialization. Moreover our analysis also gives guarantees for the more general low-rank approximation problem. The analysis introduces new techniques for understanding iterative methods in the overparameterized regime based on new matrix anticoncentration arguments.
ICML Conference 2025 Conference Paper
Conformal Prediction is a widely studied technique to construct prediction sets of future observations. Most conformal prediction methods focus on achieving the necessary coverage guarantees, but do not provide formal guarantees on the size (volume) of the prediction sets. We first prove the impossibility of volume optimality where any distribution-free method can only find a trivial solution. We then introduce a new notion of volume optimality by restricting the prediction sets to belong to a set family (of finite VC-dimension), specifically a union of $k$-intervals. Our main contribution is an efficient distribution-free algorithm based on dynamic programming (DP) to find a union of $k$-intervals that is guaranteed for any distribution to have near-optimal volume among all unions of $k$-intervals satisfying the desired coverage property. By adopting the framework of distributional conformal prediction (Chernozhukov et al. , 2021), the new DP based conformity score can also be applied to achieve approximate conditional coverage and conditional restricted volume optimality, as long as a reasonable estimator of the conditional CDF is available. While the theoretical results already establish volume-optimality guarantees, they are complemented by experiments that demonstrate that our method can significantly outperform existing methods in many settings.
STOC Conference 2024 Conference Paper
STOC Conference 2022 Conference Paper
Online learning with expert advice is a fundamental problem of sequential prediction. In this problem, the algorithm has access to a set of n “experts” who make predictions on each day. The goal on each day is to process these predictions, and make a prediction with the minimum cost. After making a prediction, the algorithm sees the actual outcome on that day, updates its state, and then moves on to the next day. An algorithm is judged by how well it does compared to the best expert in the set. The classical algorithm for this problem is the multiplicative weights algorithm, which has been well-studied in many fields since as early as the 1950s. Variations of this algorithm have been applied to and optimized for a broad range of problems, including boosting an ensemble of weak-learners in machine learning, and approximately solving linear and semi-definite programs. However, every application, to our knowledge, relies on storing weights for every expert, and uses Ω( n ) memory. There is little work on understanding the memory required to solve the online learning with expert advice problem (or to run standard sequential prediction algorithms, such as multiplicative weights) in natural streaming models, which is especially important when the number of experts and number of days are both large. We initiate the study of the learning with expert advice problem in the streaming setting, and show lower and upper bounds. Our lower bound for i.i.d., random order, and adversarial order streams uses a reduction to a custom-built problem with a novel masking technique, to show a smooth trade-off for regret versus memory. Our upper bounds show new ways to run standard sequential prediction algorithms in rounds on small “pools” of experts, thus reducing the necessary memory. For random-order streams, we show that our upper bound is tight up to low order terms. We hope that these results and techniques will have broad applications in online learning, and can inspire algorithms based on standard sequential prediction techniques, like multiplicative weights, for a wide range of other problems in the memory-constrained setting.
NeurIPS Conference 2022 Conference Paper
The most widely used technique for solving large-scale semidefinite programs (SDPs) in practice is the non-convex Burer-Monteiro method, which explicitly maintains a low-rank SDP solution for memory efficiency. There has been much recent interest in obtaining a better theoretical understanding of the Burer-Monteiro method. When the maximum allowed rank $p$ of the SDP solution is above the Barvinok-Pataki bound (where a globally optimal solution of rank at most \(p\) is guaranteed to exist), a recent line of work established convergence to a global optimum for generic or smoothed instances of the problem. However, it was open whether there even exists an instance in this regime where the Burer-Monteiro method fails. We prove that the Burer-Monteiro method can fail for the Max-Cut SDP on $n$ vertices when the rank is above the Barvinok-Pataki bound ($p \ge \sqrt{2n}$). We provide a family of instances that have spurious local minima even when the rank $p = n/2$. Combined with existing guarantees, this settles the question of the existence of spurious local minima for the Max-Cut formulation in all ranges of the rank and justifies the use of beyond worst-case paradigms like smoothed analysis to obtain guarantees for the Burer-Monteiro method.