Arrow Research search

Author name cluster

Aaron Sidford

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

92 papers
2 author rows

Possible papers

92

NeurIPS Conference 2025 Conference Paper

Balancing Gradient and Hessian Queries in Non-Convex Optimization

  • Deeksha Adil
  • Brian Bullins
  • Aaron Sidford
  • Chenyi Zhang

We develop optimization methods which offer new trade-offs between the number of gradient and Hessian computations needed to compute the critical point of a non-convex function. We provide a method that for a twice-differentiable $f\colon \mathbb{R}^d \rightarrow \mathbb{R}$ with $L_2$-Lipschitz Hessian, and input initial point with $\Delta$-bounded sub-optimality and sufficiently small $\epsilon > 0$ outputs an $\epsilon$-critical point, i. e. , a point $x$ such that $\|\nabla f(x)\| \leq \epsilon$, using $\tilde{O}(\Delta L_2^{1/4} n_H^{-1/2}\epsilon^{-9/4})$ queries to a gradient oracle and $n_H$ queries to a Hessian oracle. As a consequence, we obtain an improved gradient query complexity of $\tilde{O}(d^{1/3}L_2^{1/2}\Delta\epsilon^{-3/2})$ in the case of bounded dimension and of $\tilde{O}(\Delta^{3/2} L_2^{3/4}\epsilon^{-9/4})$ in the case where we are allowed only a single Hessian query. We obtain these results through a more general algorithm which can handle approximate Hessian computations and recovers known prior state-of-the-art bounds of computing an $\epsilon$-critical point, under the additional assumption that $f$ has an $L_1$-Lipschitz gradient, with $O(\Delta L_2^{1/4}\epsilon^{-7/4})$-gradient queries.

FOCS Conference 2025 Conference Paper

Generalized Flow in Nearly-linear Time on Moderately Dense Graphs

  • Shunhua Jiang
  • Michael Kapralov
  • Lawrence Li
  • Aaron Sidford

In this paper we consider generalized flow problems where there is an m-edge n-node directed graph $G=(V, E)$ and each edge $e \in E$ has a loss factor $\gamma_{e}\gt 0$ governing whether the flow is increased or decreased as it crosses edge e. We provide a randomized $\widetilde{O}\left(\left(m+n^{1. 5}\right) \cdot \operatorname{polylog}\left(\frac{W}{\delta}\right)\right)$ time algorithm for solving the generalized maximum flow and generalized minimum cost flow problems in this setting where $\delta$ is the target accuracy and W is the maximum of all costs, capacities, and loss factors and their inverses. This improves upon the previous state-of-the-art $\widetilde{O}\left(m \sqrt{n} \cdot \log ^{2}\left(\frac{W}{\delta}\right)\right)$ time algorithm, obtained by combining the algorithm of [17] with techniques from [29]. To obtain this result we provide new dynamic data structures and spectral results regarding the matrices associated to generalized flows and apply them through the interior point method framework of [39]. 1. 1 The full version of this paper is available at https: //arxiv. org/abs/2510. 17740.

NeurIPS Conference 2025 Conference Paper

Isotropic Noise in Stochastic and Quantum Convex Optimization

  • Annie Marsden
  • Liam O'Carroll
  • Aaron Sidford
  • Chenyi Zhang

We consider the problem of minimizing a $d$-dimensional Lipschitz convex function using a stochastic gradient oracle. We introduce and motivate a setting where the noise of the stochastic gradient is isotropic in that it is bounded in every direction with high probability. We then develop an algorithm for this setting which improves upon prior results by a factor of $d$ in certain regimes, and as a corollary, achieves a new state-of-the-art complexity for sub-exponential noise. We give matching lower bounds (up to polylogarithmic factors) for both results. Additionally, we develop an efficient quantum isotropifier, a quantum algorithm which converts a variance-bounded quantum sampling oracle into one that outputs an unbiased estimate with isotropic error. Combining our results, we obtain improved dimension-dependent rates for quantum stochastic convex optimization.

FOCS Conference 2025 Conference Paper

Solving Zero-Sum Games with Fewer Matrix-Vector Products

  • Ishani Karmarkar
  • Liam O'Carroll
  • Aaron Sidford

In this paper we consider the problem of computing an $\epsilon$-approximate Nash Equilibrium of a zerosum game in a payoff matrix $A \in \mathbb{R}^{m \times n}$ with $O(1)$-bounded entries given access to a matrix-vector product oracle for A and its transpose $A^{\top}$. We provide a deterministic algorithm that solves the problem using $\tilde{O}\left(\epsilon^{-8 / 9}\right)$-oracle queries, where $\tilde{O}(\cdot)$ hides factors polylogarithmic in m, n, and $\epsilon^{-1}$. Our result improves upon the state-of-the-art query complexity of $\tilde{O}\left(\epsilon^{-1}\right)$ established by [Nemirovski, 2004] and [Nesterov, 2005]. We obtain this result through a general framework that yields improved deterministic query complexities for solving a broader class of minimax optimization problems which includes computing a linear classifier (hard-margin support vector machine) as well as linear regression. 1 1 This paper is an extended abstract. The full paper can be accessed at https: //arxiv. org/abs/2509. 04426.

NeurIPS Conference 2024 Conference Paper

Semi-Random Matrix Completion via Flow-Based Adaptive Reweighting

  • Jonathan A. Kelner
  • Jerry Li
  • Allen Liu
  • Aaron Sidford
  • Kevin Tian

We consider the well-studied problem of completing a rank-$r$, $\mu$-incoherent matrix $\mathbf{M} \in \mathbb{R}^{d \times d}$ from incomplete observations. We focus on this problem in the semi-random setting where each entry is independently revealed with probability at least $p = \frac{\textup{poly}(r, \mu, \log d)}{d}$. Whereas multiple nearly-linear time algorithms have been established in the more specialized fully-random setting where each entry is revealed with probablity exactly $p$, the only known nearly-linear time algorithm in the semi-random setting is due to [CG18], whose sample complexity has a polynomial dependence on the inverse accuracy and condition number and thus cannot achieve high-accuracy recovery. Our main result is the first high-accuracy nearly-linear time algorithm for solving semi-random matrix completion, and an extension to the noisy observation setting. Our result builds upon the recent short-flat decomposition framework of [KLLST23a, KLLST23b] and leverages fast algorithms for flow problems on graphs to solve adaptive reweighting subproblems efficiently.

STOC Conference 2024 Conference Paper

Sparsifying Generalized Linear Models

  • Arun Jambulapati
  • James R. Lee
  • Yang P. Liu
  • Aaron Sidford

We consider the sparsification of sums F : ℝ n → ℝ + where F ( x ) = f 1 (⟨ a 1 , x ⟩) + ⋯ + f m (⟨ a m , x ⟩) for vectors a 1 ,…, a m ∈ ℝ n and functions f 1 ,…, f m : ℝ → ℝ + . We show that (1+ε)-approximate sparsifiers of F with support size n /ε 2 (log n /ε) O (1) exist whenever the functions f 1 ,…, f m are symmetric, monotone, and satisfy natural growth bounds. Additionally, we give efficient algorithms to compute such a sparsifier assuming each f i can be evaluated efficiently. Our results generalize the classical case of ℓ p sparsification, where f i ( z ) = | z | p , for p ∈ (0, 2], and give the first near-linear size sparsifiers in the well-studied setting of the Huber loss function and its generalizations, e.g., f i ( z ) = min{| z | p , | z | 2 } for 0 < p ≤ 2. Our sparsification algorithm can be applied to give near-optimal reductions for optimizing a variety of generalized linear models including ℓ p regression for p ∈ (1, 2] to high accuracy, via solving (log n ) O (1) sparse regression instances with m ≤ n (log n ) O (1) , plus runtime proportional to the number of nonzero entries in the vectors a 1 , …, a m .

NeurIPS Conference 2024 Conference Paper

Truncated Variance Reduced Value Iteration

  • Yujia Jin
  • Ishani Karmarkar
  • Aaron Sidford
  • Jiayi Wang

We provide faster randomized algorithms for computing an $\epsilon$-optimal policy in a discounted Markov decision process with $A_{\text{tot}}$-state-action pairs, bounded rewards, and discount factor $\gamma$. We provide an $\tilde{O}(A_{\text{tot}}[(1 - \gamma)^{-3}\epsilon^{-2} + (1 - \gamma)^{-2}])$-time algorithm in the sampling setting, where the probability transition matrix is unknown but accessible through a generative model which can be queried in $\tilde{O}(1)$-time, and an $\tilde{O}(s + (1-\gamma)^{-2})$-time algorithm in the offline setting where the probability transition matrix is known and $s$-sparse. These results improve upon the prior state-of-the-art which either ran in $\tilde{O}(A_{\text{tot}}[(1 - \gamma)^{-3}\epsilon^{-2} + (1 - \gamma)^{-3}])$ time [Sidford, Wang, Wu, Ye 2018] in the sampling setting, $\tilde{O}(s + A_{\text{tot}} (1-\gamma)^{-3})$ time [Sidford, Wang, Wu, Yang, Ye 2018] in the offline setting, or time at least quadratic in the number of states using interior point methods for linear programming. We achieve our results by building upon prior stochastic variance-reduced value iteration methods [Sidford, Wang, Wu, Yang, Ye 2018]. We provide a variant that carefully truncates the progress of its iterates to improve the variance of new variance-reduced sampling procedures that we introduce to implement the steps. Our method is essentially model-free and can be implemented in $\tilde{O}(A_{\text{tot}})$-space when given generative model access. Consequently, our results take a step in closing the sample-complexity gap between model-free and model-based methods.

FOCS Conference 2023 Conference Paper

A Deterministic Almost-Linear Time Algorithm for Minimum-Cost Flow

  • Jan van den Brand
  • Li Chen 0028
  • Richard Peng
  • Rasmus Kyng
  • Yang P. Liu
  • Maximilian Probst Gutenberg
  • Sushant Sachdeva
  • Aaron Sidford

We give a deterministic $m^{1+o(1)}$ time algorithm that computes exact maximum flows and minimum-cost flows on directed graphs with m edges and polynomially bounded integral demands, costs, and capacities. As a consequence, we obtain the first running time improvement for deterministic algorithms that compute maximum-flow in graphs with polynomial bounded capacities since the work of Goldberg-Rao [J. ACM ’98]. Our algorithm builds on the framework of Chen-Kyng-Liu-Peng-Gutenberg-Sachdeva [FOCS ’22] that computes an optimal flow by computing a sequence of $m^{1+o(1)}$-approximate undirected minimum-ratio cycles. We develop a deterministic dynamic graph data-structure to compute such a sequence of minimum-ratio cycles in an amortized $m^{o(1)}$ time per edge update. Our key technical contributions are deterministic analogues of the vertex sparsification and edge sparsification components of the data-structure from Chen et al. For the vertex sparsification component, we give a method to avoid the randomness in Chen et al. which involved sampling random trees to recurse on. For the edge sparsification component, we design a deterministic algorithm that maintains an embedding of a dynamic graph into a sparse spanner. We also show how our dynamic spanner can be applied to give a deterministic data structure that maintains a fully dynamic low-stretch spanning tree on graphs with polynomially bounded edge lengths, with subpolynomial average stretch and subpolynomial amortized time per edge update.

STOC Conference 2023 Conference Paper

Chaining, Group Leverage Score Overestimates, and Fast Spectral Hypergraph Sparsification

  • Arun Jambulapati
  • Yang P. Liu
  • Aaron Sidford

We present an algorithm that given any n -vertex, m -edge, rank r hypergraph constructs a spectral sparsifier with O ( n ε −2 log n log r ) hyperedges in nearly-linear O ( mr ) time. This improves in both size and efficiency over a line of work [Bansal-Svensson-Trevisan 2019, Kapralov-Krauthgamer-Tardos-Yoshida 2021] for which the previous best size was O (min{ n ε −4 log 3 n , nr 3 ε −2 log n }) and runtime was O ( mr + n O (1) ).

IJCAI Conference 2023 Conference Paper

Efficient Convex Optimization Requires Superlinear Memory (Extended Abstract)

  • Annie Marsden
  • Vatsal Sharan
  • Aaron Sidford
  • Gregory Valiant

Minimizing a convex function with access to a first order oracle---that returns the function evaluation and (sub)gradient at a query point---is a canonical optimization problem and a fundamental primitive in machine learning. Gradient-based methods are the most popular approaches used for solving the problem, owing to their simplicity and computational efficiency. These methods, however, do not achieve the information-theoretically optimal query complexity for minimizing the underlying function to small error, which are achieved by more expensive techniques based on cutting-plane methods. Is it possible to achieve the information-theoretically query complexity without using these more complex and computationally expensive methods? In this work, we use memory as a lens to understand this, and show that is is not possible to achieve optimal query complexity without using significantly more memory than that used by gradient descent.

SODA Conference 2023 Conference Paper

Improved girth approximation in weighted undirected graphs

  • Avi Kadria
  • Liam Roditty
  • Aaron Sidford
  • Virginia Vassilevska Williams
  • Uri Zwick

Let G = ( V, E, ℓ) be a n -nodes m -edges weighted undirected graph, where ℓ: E → (0, ∞) is a real length function defined on its edges. Let g be the length of the shortest cycle in G. We present an algorithm that in O ( kn 1+1/ k log n + m(k + log n )) expected running time finds a cycle of length at most, for every integer k ≥ 1. This improves upon the previous best algorithm that in O((n 1+1/k log n + m ) log( nM )) time, where ℓ: E → [1, M ] is an integral length function, finds a cycle of length at most 2 kg [KRS + 22]. For k = 1 our algorithm also improves the result of Roditty and Tov [RT13].

FOCS Conference 2023 Conference Paper

Matrix Completion in Almost-Verification Time

  • Jonathan A. Kelner
  • Jerry Li 0001
  • Allen Liu
  • Aaron Sidford
  • Kevin Tian

We give a new framework for solving the fundamental problem of low-rank matrix completion, i. e. , approximating a rank- r matrix $\mathbf{M} \in \mathbb{R}^{m \times n}$ (where $m \geq n$) from random observations. First, we provide an algorithm which completes M on $99 \%$ of rows and columns under no further assumptions on M from $\approx m r$ samples and using $\approx m r^{2}$ time. Then, assuming the row and column spans of M satisfy additional regularity properties, we show how to boost this partial completion guarantee to a full matrix completion algorithm by aggregating solutions to regression problems involving the observations. In the well-studied setting where M has incoherent row and column spans, our algorithms complete M to high precision from $m r^{2+o(1)}$ observations in $m r^{3+o(1)}$ time (omitting logarithmic factors in problem parameters), improving upon the prior state-of-the-art [JN15] which used $\approx m r^{5}$ samples and $\approx m r^{7}$ time. Under an assumption on the row and column spans of M we introduce (which is satisfied by random subspaces with high probability), our sample complexity improves to an almost information-theoretically optimal $m r^{1+o(1)}$, and our runtime improves to $m r^{2+o(1)}$. Our runtimes have the appealing property of matching the best known runtime to verify that a rankr decomposition $\mathrm{UV}^{\top}$ agrees with the sampled observations. We also provide robust variants of our algorithms that, given random observations from $\mathrm{M}+\mathrm{N}$ with $\|\mathrm{N}\|_{\mathrm{F}} \leq \Delta$, complete M to Frobenius norm distance $\approx r^{1. 5} \Delta$ in the same runtimes as the noiseless setting. Prior noisy matrix completion algorithms [CP10] only guaranteed a distance of $\approx \sqrt{n} \Delta$.

NeurIPS Conference 2023 Conference Paper

Parallel Submodular Function Minimization

  • Deeparnab Chakrabarty
  • Andrei Graur
  • Haotian Jiang
  • Aaron Sidford

We consider the parallel complexity of submodular function minimization (SFM). We provide a pair of methods which obtain two new query versus depth trade-offs a submodular function defined on subsets of $n$ elements that has integer values between $-M$ and $M$. The first method has depth $2$ and query complexity $n^{O(M)}$ and the second method has depth $\widetilde{O}(n^{1/3} M^{2/3})$ and query complexity $O(\mathrm{poly}(n, M))$. Despite a line of work on improved parallel lower bounds for SFM, prior to our work the only known algorithms for parallel SFM either followed from more general methods for sequential SFM or highly-parallel minimization of convex $\ell_2$-Lipschitz functions. Interestingly, to obtain our second result we provide the first highly-parallel algorithm for minimizing $\ell_\infty$-Lipschitz function over the hypercube which obtains near-optimal depth for obtaining constant accuracy.

NeurIPS Conference 2023 Conference Paper

Quantum speedups for stochastic optimization

  • Aaron Sidford
  • Chenyi Zhang

We consider the problem of minimizing a continuous function given given access to a natural quantum generalization of a stochastic gradient oracle. We provide two new methods for the special case of minimizing a Lipschitz convex function. Each method obtains a dimension versus accuracy trade-off which is provably unachievable classically and we prove that one method is asymptotically optimal in low-dimensional settings. Additionally, we provide quantum algorithms for computing a critical point of a smooth non-convex function at rates not known to be achievable classically. To obtain these results we build upon the quantum multivariate mean estimation result of Cornelissen et al. and provide a general quantum variance reduction technique of independent interest.

ICML Conference 2023 Conference Paper

Quantum Speedups for Zero-Sum Games via Improved Dynamic Gibbs Sampling

  • Adam Bouland
  • Yosheb M. Getachew
  • Yujia Jin
  • Aaron Sidford
  • Kevin Tian

We give a quantum algorithm for computing an $\epsilon$-approximate Nash equilibrium of a zero-sum game in a $m \times n$ payoff matrix with bounded entries. Given a standard quantum oracle for accessing the payoff matrix our algorithm runs in time $\widetilde{O}(\sqrt{m + n}\cdot \epsilon^{-2. 5} + \epsilon^{-3})$ and outputs a classical representation of the $\epsilon$-approximate Nash equilibrium. This improves upon the best prior quantum runtime of $\widetilde{O}(\sqrt{m + n} \cdot \epsilon^{-3})$ obtained by [van Apeldoorn, Gilyen ’19] and the classical $\widetilde{O}((m + n) \cdot \epsilon^{-2})$ runtime due to [Grigoradis, Khachiyan ’95] whenever $\epsilon = \Omega((m +n)^{-1})$. We obtain this result by designing new quantum data structures for efficiently sampling from a slowly-changing Gibbs distribution.

FOCS Conference 2023 Conference Paper

ReSQueing Parallel and Private Stochastic Convex Optimization

  • Yair Carmon
  • Arun Jambulapati
  • Yujia Jin
  • Yin Tat Lee
  • Daogao Liu
  • Aaron Sidford
  • Kevin Tian

We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJ+20], [ACJ+21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^{d}$, we obtain the following results (up to polylogarithmic factors). 1)We give a parallel algorithm obtaining optimization error $\epsilon_{\text {opt} }$ with $d^{1 / 3} \epsilon_{\text {opt} }^{-2 / 3}$ gradient oracle query depth and $d^{1 / 3} \epsilon_{\text {opt} }^{-2 / 3}+\epsilon_{\text {opt} }^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text {opt} } \in\left[d^{-1}, d^{-1 / 4}\right]$, our algorithm matches the state-of-the-art oracle depth of [BJL+19] while maintaining the optimal total work of stochastic gradient descent. 2)Given n samples of Lipschitz loss functions, prior works [BFTT19], [BFGT20], [AFKT21], [KLL21] established that if $n \gt rsim d \epsilon_{\mathrm{dp}}^{-2}, \left(\epsilon_{\mathrm{dp}}, \delta\right)$-differential privacy is attained at no asymptotic cost to the SCO utility. However, these prior works all required a superlinear number of gradient queries. We close this gap for sufficiently large $n \gt rsim d^{2} \epsilon_{{\bf d p}}^{-3}$, by using ReSQue to design an algorithm with near-linear gradient query complexity in this regime.

FOCS Conference 2023 Conference Paper

Singular Value Approximation and Sparsifying Random Walks on Directed Graphs

  • AmirMahdi Ahmadinejad
  • John Peebles
  • Edward Pyne
  • Aaron Sidford
  • Salil P. Vadhan

In this paper, we introduce a new, spectral notion of approximation between directed graphs, which we call singular value (SV) approximation. SV-approximation is stronger than previous notions of spectral approximation considered in the literature, including spectral approximation of Laplacians for undirected graphs [ST04], standard approximation for directed graphs [CKP + 17], and unit-circle (UC) approximation for directed graphs [AKM + 20]. Further, SV approximation enjoys several useful properties not possessed by previous notions of approximation, e. g. , it is preserved under products of randomwalk matrices and bounded matrices. We provide a nearly linear-time algorithm for SV-sparsifying (and hence UC-sparsifying) Eulerian directed graphs, as well as $\ell$-step random walks on such graphs, for any $\ell \leq \operatorname{poly}(n)$. Combined with the Eulerian scaling algorithms of [CKK + 18], given an arbitrary (not necessarily Eulerian) directed graph and a set S of vertices, we can approximate the stationary probability mass of the $\left(S, S^{c}\right)$ cut in an $\ell$-step random walk to within a multiplicative error of $1 / \operatorname{polylog}(n)$ and an additive error of $1 / \operatorname{poly}(n)$ in nearly linear time. As a starting point for these results, we provide a simple black-box reduction from SV-sparsifying Eulerian directed graphs to SV-sparsifying undirected graphs; such a directed-to-undirected reduction was not known for previous notions of spectral approximation.

FOCS Conference 2023 Conference Paper

Sparse Submodular Function Minimization

  • Andrei Graur
  • Haotian Jiang
  • Aaron Sidford

In this paper we study the problem of minimizing a submodular function $f: 2^{V} \rightarrow \mathbb{R}$ that is guaranteed to have a k-sparse minimizer. We give a deterministic algorithm that computes an additive $\epsilon$-approximate minimizer of such f in $\widetilde{O}(\operatorname{poly}(k) \log (|f| / \epsilon))$ parallel depth using a polynomial number of queries to an evaluation oracle of f, where $|f|=\max_{S \subseteq V}|f(S)|$. Further, we give a randomized algorithm that computes an exact minimizer of f with high probability using $\widetilde{O}(|V| \cdot \operatorname{poly}(k))$ queries and polynomial time. When $k=\widetilde{O}(1)$, our algorithms use either nearly-constant parallel depth or a nearly-linear number of evaluation oracle queries. All previous algorithms for this problem either use $\Omega(|V|)$ parallel depth or $\Omega\left(|V|^{2}\right)$ queries. In contrast to state-of-the-art weakly-polynomial and strongly-polynomial time algorithms for SFM, our algorithms use first-order optimization methods, e. g. , mirror descent and follow the regularized leader. We introduce what we call sparse dual certificates, which encode information on the structure of sparse minimizers, and both our parallel and sequential algorithms provide new algorithmic tools for allowing first-order optimization methods to efficiently compute them. Correspondingly, our algorithm does not invoke fast matrix multiplication or general linear system solvers and in this sense is more combinatorial than previous state-of-the-art methods.

FOCS Conference 2023 Conference Paper

Sparsifying Sums of Norms

  • Arun Jambulapati
  • James R. Lee
  • Yang P. Liu
  • Aaron Sidford

Abstract-For any norms $N_{1}, \ldots, N_{m}$ on $\mathbb{R}^{n}$ and $N(x): = N_{1}(x)+\cdots+N_{m}(x)$, we show there is a sparsified norm $\tilde{N}(x)= w_{1} N_{1}(x)+\cdots+w_{m} N_{m}(x)$ such that $|N(x)-\tilde{N}(x)| \leqslant \varepsilon N(x)$ for all $x \in \mathbb{R}^{n}$, where $w_{1}, \ldots, w_{m}$ are non-negative weights, of which only $O\left(\varepsilon^{-2} n \log (n / \varepsilon)(\log n)^{2. 5}\right)$ are non-zero. Additionally, we show that such weights can be found with high probability in time $O\left(m(\log n)^{O(1)}+\right. $ poly $\left. (n)\right) T$, where T is the time required to evaluate a norm $N_{i}(x)$, assuming that $N(x)$ is poly $(n)$ equivalent to the Euclidean norm. This immediately yields analogous statements for sparsifying sums of symmetric submodular functions. More generally, we show how to sparsify sums of p th powers of norms when the sum is p-uniformly smooth. 1

NeurIPS Conference 2023 Conference Paper

Structured Semidefinite Programming for Recovering Structured Preconditioners

  • Arun Jambulapati
  • Jerry Li
  • Christopher Musco
  • Kirankumar Shiragur
  • Aaron Sidford
  • Kevin Tian

We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including: Diagonal preconditioning. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, computes an $\epsilon$-optimal diagonal preconditioner in time $\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star, \epsilon^{-1}))$, where $\kappa^\star$ is the optimal condition number of the rescaled matrix. Structured linear systems. We give an algorithm which, given $\mathbf{M} \in \mathbb{R}^{d \times d}$ that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in $\mathbf{M}$ in $\widetilde{O}(d^2)$ time. Our diagonal preconditioning results improve state-of-the-art runtimes of $\Omega(d^{3. 5})$ attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of $\Omega(d^{\omega})$ where $\omega > 2. 3$ is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call matrix-dictionary approximation SDPs, which we leverage to solve an associated problem we call matrix-dictionary recovery.

NeurIPS Conference 2023 Conference Paper

Towards Optimal Effective Resistance Estimation

  • Rajat Vadiraj Dwaraknath
  • Ishani Karmarkar
  • Aaron Sidford

We provide new algorithms and conditional hardness for the problem of estimating effective resistances in $n$-node $m$-edge undirected, expander graphs. We provide an $\widetilde{O}(m\epsilon^{-1})$-time algorithm that produces with high probability, an $\widetilde{O}(n\epsilon^{-1})$-bit sketch from which the effective resistance between any pair of nodes can be estimated, to $(1 \pm \epsilon)$-multiplicative accuracy, in $\widetilde{O}(1)$-time. Consequently, we obtain an $\widetilde{O}(m\epsilon^{-1})$-time algorithm for estimating the effective resistance of all edges in such graphs, improving (for sparse graphs) on the previous fastest runtimes of $\widetilde{O}(m\epsilon^{-3/2})$ [Chu et. al. 2018] and $\widetilde{O}(n^2\epsilon^{-1})$ [Jambulapati, Sidford, 2018] for general graphs and $\widetilde{O}(m + n\epsilon^{-2})$ for expanders [Li, Sachdeva 2022]. We complement this result by showing a conditional lower bound that a broad set of algorithms for computing such estimates of the effective resistances between all pairs of nodes require $\widetilde{\Omega}(n^2 \epsilon^{-1/2})$-time, improving upon the previous best such lower bound of $\widetilde{\Omega}(n^2 \epsilon^{-1/13})$ [Musco et. al. 2017]. Further, we leverage the tools underlying these results to obtain improved algorithms and conditional hardness for more general problems of sketching the pseudoinverse of positive semidefinite matrices and estimating functions of their eigenvalues.

FOCS Conference 2022 Conference Paper

Improved Lower Bounds for Submodular Function Minimization

  • Deeparnab Chakrabarty
  • Andrei Graur
  • Haotian Jiang
  • Aaron Sidford

We provide a generic technique for constructing families of submodular functions to obtain lower bounds for submodular function minimization (SFM). Applying this technique, we prove that any deterministic SFM algorithm on a ground set of n elements requires at least $\Omega(n\log n)$ queries to an evaluation oracle. This is the first super-linear query complexity lower bound for SFM and improves upon the previous best lower bound of 2n given by [Graur et al. , ITCS 2020]. Using our construction, we also prove that any (possibly randomized) parallel SFM algorithm, which can make up to poly $(n)$ queries per round, requires at least $\Omega(n/\log n)$ rounds to minimize a submodular function. This improves upon the previous best lower bound of $\tilde{\Omega}(n^{1/3})$ rounds due to [Chakrabarty et al. , FOCS 2021], and settles the parallel complexity of query-efficient SFM up to logarithmic factors due to a recent advance in [Jiang, SODA 2021].

NeurIPS Conference 2022 Conference Paper

On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood

  • Moses Charikar
  • Zhihao Jiang
  • Kirankumar Shiragur
  • Aaron Sidford

We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given $n$ independent samples. Our estimator is based on profile-maximum-likelihood (PML) and is sample optimal for estimating various symmetric properties when the estimation error $\epsilon \gg n^{-1/3}$. This result improves upon the previous best accuracy threshold of $\epsilon \gg n^{-1/4}$ achievable by polynomial time computable PML-based universal estimators \cite{ACSS20, ACSS20b}. Our estimator reaches a theoretical limit for universal symmetric property estimation as \cite{Han20} shows that a broad class of universal estimators (containing many well known approaches including ours) cannot be sample optimal for every $1$-Lipschitz property when $\epsilon \ll n^{-1/3}$.

NeurIPS Conference 2022 Conference Paper

Optimal and Adaptive Monteiro-Svaiter Acceleration

  • Yair Carmon
  • Danielle Hausler
  • Arun Jambulapati
  • Yujia Jin
  • Aaron Sidford

We develop a variant of the Monteiro-Svaiter (MS) acceleration framework that removes the need to solve an expensive implicit equation at every iteration. Consequently, for any $p\ge 2$ we improve the complexity of convex optimization with Lipschitz $p$th derivative by a logarithmic factor, matching a lower bound. We also introduce an MS subproblem solver that requires no knowledge of problem parameters, and implement it as either a second- or first-order method by solving linear systems or applying MinRes, respectively. On logistic regression problems our method outperforms previous accelerated second-order methods, but under-performs Newton's method; simply iterating our first-order adaptive subproblem solver is competitive with L-BFGS.

ICML Conference 2022 Conference Paper

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

  • Yair Carmon
  • Arun Jambulapati
  • Yujia Jin
  • Aaron Sidford

The accelerated proximal point method (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i. e. , regularized minimization). This reduction is conceptually elegant and yields strong convergence rate guarantees. However, these rates feature an extraneous logarithmic term arising from the need to compute each proximal point to high accuracy. In this work, we propose a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions. We apply RECAPP to two canonical problems: finite-sum and max-structured minimization. For finite-sum problems, we match the best known complexity, previously obtained by carefully-designed problem-specific algorithms. For minimizing max_y f(x, y) where f is convex in x and strongly-concave in y, we improve on the best known (Catalyst-based) bound by a logarithmic factor.

SODA Conference 2022 Conference Paper

Semi-Streaming Bipartite Matching in Fewer Passes and Optimal Space

  • Sepehr Assadi
  • Arun Jambulapati
  • Yujia Jin
  • Aaron Sidford
  • Kevin Tian

We provide Õ ( ∊ –1 )-pass semi-streaming algorithms for computing (1– ∊ )-approximate maximum cardinality matchings in bipartite graphs. Our most efficient methods are deterministic and use optimal, O ( n ), space, improving upon the space complexity of the previous state-of-the-art Õ ( ∊ –1 )-pass algorithm of [AG18]. To obtain our results we provide semi-streaming adaptations of more general continuous optimization tools. Further, we leverage these techniques to obtain improvements for streaming variants of approximate linear programming, optimal transport, exact matching, transshipment, and shortest path problems.

NeurIPS Conference 2021 Conference Paper

Stochastic Bias-Reduced Gradient Methods

  • Hilal Asi
  • Yair Carmon
  • Arun Jambulapati
  • Yujia Jin
  • Aaron Sidford

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function $f$. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $\delta$, variance $O(\log(1/\delta))$, and an expected sampling cost of $O(\log(1/\delta))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau envelope of any Lipschitz convex function. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.

ICML Conference 2021 Conference Paper

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

  • Yujia Jin
  • Aaron Sidford

We prove new upper and lower bounds for sample complexity of finding an $\epsilon$-optimal policy of an infinite-horizon average-reward Markov decision process (MDP) given access to a generative model. When the mixing time of the probability transition matrix of all policies is at most $t_\mathrm{mix}$, we provide an algorithm that solves the problem using $\widetilde{O}(t_\mathrm{mix} \epsilon^{-3})$ (oblivious) samples per state-action pair. Further, we provide a lower bound showing that a linear dependence on $t_\mathrm{mix}$ is necessary in the worst case for any algorithm which computes oblivious samples. We obtain our results by establishing connections between infinite-horizon average-reward MDPs and discounted MDPs of possible further utility.

SODA Conference 2021 Conference Paper

Ultrasparse Ultrasparsifiers and Faster Laplacian System Solvers

  • Arun Jambulapati
  • Aaron Sidford

In this paper we provide an O ( m loglog O (1) n log(1/ ∊ ))-expected time algorithm for solving Laplacian systems on n -node m -edge graphs, improving improving upon the previous best expected runtime of achieved by (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu 2014). To obtain this result we provide efficient constructions of ℓ p -stretch graph approximations with improved stretch and sparsity bounds. Additionally, as motivation for this work, we show that for every set of vectors in ℝ d (not just those induced by graphs) and all k > 0 there exist an ultra-sparsifiers with d – 1 + O ( d/k ) re-weighted vectors of relative condition number at most k 2. For small k, this improves upon the previous best known multiplicative factor of k · Õ (log d ), which is only known for the graph case.

NeurIPS Conference 2020 Conference Paper

Acceleration with a Ball Optimization Oracle

  • Yair Carmon
  • Arun Jambulapati
  • Qijia Jiang
  • Yujia Jin
  • Yin Tat Lee
  • Aaron Sidford
  • Kevin Tian

Consider an oracle which takes a point x and returns the minimizer of a convex function f in an l 2 ball of radius r around x. It is straightforward to show that roughly r^{-1}\log(1/epsilon) calls to the oracle suffice to find an \epsilon-approximate minimizer of f in an l 2 unit ball. Perhaps surprisingly, this is not optimal: we design an accelerated algorithm which attains an epsilon-approximate minimizer with roughly r^{-2/3} \log(1/epsilon) oracle queries, and give a matching lower bound. Further, we implement ball optimization oracles for functions with a locally stable Hessian using a variant of Newton's method and, in certain cases, stochastic first-order methods. The resulting algorithms apply to a number of problems of practical and theoretical import, improving upon previous results for logistic and l infinity regression and achieving guarantees comparable to the state-of-the-art for l p regression.

FOCS Conference 2020 Conference Paper

Bipartite Matching in Nearly-linear Time on Moderately Dense Graphs

  • Jan van den Brand
  • Yin Tat Lee
  • Danupon Nanongkai
  • Richard Peng
  • Thatchaphol Saranurak
  • Aaron Sidford
  • Zhao Song 0002
  • Di Wang 0005

We present an $\tilde{O}(m+n^{1. 5})$ -time randomized algorithm for maximum cardinality bipartite matching and related problems (e. g. transshipment, negative-weight shortest paths, and optimal transport) on m-edge, n-node graphs. For maximum cardinality bipartite matching on moderately dense graphs, i. e. $m=\Omega(n^{1. 5})$, our algorithm runs in time nearly linear in the input size and constitutes the first improvement over the classic $O(m\sqrt{n})$ -time [Dinic 1970; Hopcroft-Karp 1971; Karzanov 1973] and $\widetilde{O}(n^{\omega})$ -time algorithms [Ibarra-Moran 1981] (where currently $\omega\approx 2. 373$ ). On sparser graphs, i. e. when $m=n^{9/8+\delta}$ for any constant $\delta > 0$, our result improves upon the recent advances of [Madry 2013] and [Liu-Sidford 2020b, 2020a] which achieve an $\widetilde{O}(m^{4/3+o(1)})$ runtime. We obtain these results by combining and advancing recent lines of research in interior point methods (IPMs) and dynamic graph algorithms. First, we simplify and improve the IPM of [v. d. Brand-Lee-Sidford-Song 2020], providing a general primal-dual IPM framework and new sampling-based techniques for handling infeasibility induced by approximate linear system solvers. Second, we provide a simple sublinear-time algorithm for detecting and sampling high-energy edges in electric flows on expanders and show that when combined with recent advances in dynamic expander decompositions, this yields efficient data structures for maintaining the iterates of both [v. d. Brand et al. ] and our new IPMs. Combining this general machinery yields a simpler $\widetilde{O}(n\sqrt{m})$ time algorithm for matching based on the logarithmic barrier function, and our state-of-the-art $\widetilde{O}(m+n^{1. 5})$ time algorithm for matching based on the [Lee-Sidford 2014] barrier (as regularized in [v. d. Brand et al. ]).

FOCS Conference 2020 Conference Paper

Coordinate Methods for Matrix Games

  • Yair Carmon
  • Yujia Jin
  • Aaron Sidford
  • Kevin Tian

We develop primal-dual coordinate methods for solving bilinear saddle-point problems of the form minx ∈ Xmaxy ∈ Yy T Ax which contain linear programming, classification, and regression as special cases. Our methods push existing fully stochastic sublinear methods and variance-reduced methods towards their limits in terms of per-iteration complexity and sample complexity. We obtain nearly-constant per-iteration complexity by designing efficient data structures leveraging Taylor approximations to the exponential and a binomial heap. We improve sample complexity via low-variance gradient estimators using dynamic sampling distributions that depend on both the iterates and the magnitude of the matrix entries. Our runtime bounds improve upon those of existing primal-dual methods by a factor depending on sparsity measures of the m by n matrix A. For example, when rows and columns have constant l1/l2 norm ratios, we offer improvements by a factor of m+n in the fully stochastic setting and √{m+n} in the variance-reduced setting. We apply our methods to computational geometry problems, i. e. minimum enclosing ball, maximum inscribed ball, and linear regression, and obtain improved complexity bounds. For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of √{nnz(A)/(m+n)}.

ICML Conference 2020 Conference Paper

Efficiently Solving MDPs with Stochastic Mirror Descent

  • Yujia Jin
  • Aaron Sidford

We present a unified framework based on primal-dual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with $A_{tot}$ total actions and mixing time bound $t_{mix}$ our method computes an $\epsilon$-optimal policy with an expected $\widetilde{O}(t_{mix}^2 A_{tot} \epsilon^{-2})$ samples from the state-transition matrix, removing the ergodicity dependence of prior art. When applied to a $\gamma$-discounted MDP with $A_{tot}$ total actions our method computes an $\epsilon$-optimal policy with an expected $\widetilde{O}((1-\gamma)^{-4} A_{tot} \epsilon^{-2})$ samples, improving over the best-known primal-dual methods while matching the state-of-the-art up to a $(1-\gamma)^{-1}$ factor. Both methods are model-free, update state values and policies simultaneously, and run in time linear in the number of samples taken. We achieve these results through a more general stochastic mirror descent framework for solving bilinear saddle-point problems with simplex and box domains and we demonstrate the flexibility of this framework by providing further applications to constrained MDPs.

STOC Conference 2020 Conference Paper

Faster energy maximization for faster maximum flow

  • Yang P. Liu
  • Aaron Sidford

In this paper we provide an algorithm which given any m -edge n -vertex directed graph with integer capacities at most U computes a maximum s - t flow for any vertices s and t in m 11/8+ o (1) U 1/4 time with high probability. This running time improves upon the previous best of Õ( m 10/7 U 1/7 ) (Mądry 2016), Õ( m √ n log U ) (Lee Sidford 2014), and O ( mn ) (Orlin 2013) when the graph is not too dense or has large capacities. We achieve this result by leveraging recent advances in solving undirected flow problems on graphs. We show that in the maximum flow framework of (Mądry 2016) the problem of optimizing the amount of perturbation of the central path needed to maximize energy and thereby reduce congestion can be efficiently reduced to a smoothed ℓ 2 -ℓ p flow optimization problem, which can be solved approximately via recent work (Kyng, Peng, Sachdeva, Wang 2019). Leveraging this new primitive, we provide a new interior point method for maximum flow with faster convergence and simpler analysis that no longer needs global potential functions involving energy as in previous methods (Mądry 2013, Mądry 2016).

FOCS Conference 2020 Conference Paper

High-precision Estimation of Random Walks in Small Space

  • AmirMahdi Ahmadinejad
  • Jonathan A. Kelner
  • Jack Murtagh
  • John Peebles
  • Aaron Sidford
  • Salil P. Vadhan

In this paper, we provide a deterministic $\tilde{O}(\log N)$ -space algorithm for estimating random walk probabilities on undirected graphs, and more generally Eulerian directed graphs, to within inverse polynomial additive error $(\epsilon=1/\text{poly}(N))$ where $N$ is the length of the input. Previously, this problem was known to be solvable by a randomized algorithm using space $O(\log N)$ (following Aleliunas et al. , FOCS '79) and by a deterministic algorithm using space $O(\log^{3/2}N)$ (Saks and Zhou, FOCS '95 and JCSS '99), both of which held for arbitrary directed graphs but had not been improved even for undirected graphs. We also give improvements on the space complexity of both of these previous algorithms for non-Eulerian directed graphs when the error is negligible $(\epsilon=1/N^{\omega(1)})$, generalizing what Hoza and Zuckerman (FOCS '18) recently showed for the special case of distinguishing whether a random walk probability is 0 or greater than $\epsilon$. We achieve these results by giving new reductions between powering Eulerian random-walk matrices and inverting Eulerian Laplacian matrices, providing a new notion of spectral approximation for Eulerian graphs that is preserved under powering, and giving the first deterministic $\tilde{O}(\log N)$ -space algorithm for inverting Eulerian Laplacian matrices. The latter algorithm builds on the work of Murtagh et al. (FOCS '17) that gave a deterministic $\tilde{O}(\log N)$ -space algorithm for inverting undirected Laplacian matrices, and the work of Cohen et al. (FOCS '19) that gave a randomized $\tilde{O}(N)$ -time algorithm for inverting Eulerian Laplacian matrices. A running theme throughout these contributions is an analysis of “cycle-lifted graphs, ” where we take a graph and “lift” it to a new graph whose adjacency matrix is the tensor product of the original adjacency matrix and a directed cycle (or variants of one).

NeurIPS Conference 2020 Conference Paper

Instance Based Approximations to Profile Maximum Likelihood

  • Nima Anari
  • Moses Charikar
  • Kirankumar Shiragur
  • Aaron Sidford

In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation. We provide an algorithm which matches the previous best known efficient algorithms for computing approximate PML distributions and improves when the number of distinct observed frequencies in the given instance is small. We achieve this result by exploiting new sparsity structure in approximate PML distributions and providing a new matrix rounding algorithm, of independent interest. Leveraging this result, we obtain the first provable computationally efficient implementation of PseudoPML, a general framework for estimating a broad class of symmetric properties. Additionally, we obtain efficient PML-based estimators for distributions with small profile entropy, a natural instance-based complexity measure. Further, we provide a simpler and more practical PseudoPML implementation that matches the best-known theoretical guarantees of such an estimator and evaluate this method empirically.

NeurIPS Conference 2020 Conference Paper

Large-Scale Methods for Distributionally Robust Optimization

  • Daniel Levy
  • Yair Carmon
  • John C. Duchi
  • Aaron Sidford

We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $\chi^2$ uncertainty sets these are the first such guarantees in the literature, and for CVaR our guarantees scale linearly in the uncertainty level rather than quadratically as in previous work. We also provide lower bounds proving the worst-case optimality of our algorithms for CVaR and a penalized version of the $\chi^2$ problem. Our primary technical contributions are novel bounds on the bias of batch robust risk estimation and the variance of a multilevel Monte Carlo gradient estimator due to [Blanchet & Glynn, 2015]. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9-36 times more efficient than full-batch methods.

STOC Conference 2020 Conference Paper

Solving tall dense linear programs in nearly linear time

  • Jan van den Brand
  • Yin Tat Lee
  • Aaron Sidford
  • Zhao Song 0002

In this paper we provide an O ( nd + d 3 ) time randomized algorithm for solving linear programs with d variables and n constraints with high probability. To obtain this result we provide a robust, primal-dual O (√ d )-iteration interior point method inspired by the methods of Lee and Sidford (2014, 2019) and show how to efficiently implement this method using new data-structures based on heavy-hitters, the Johnson–Lindenstrauss lemma, and inverse maintenance. Interestingly, we obtain this running time without using fast matrix multiplication and consequently, barring a major advance in linear system solving, our running time is near optimal for solving dense linear programs among algorithms that do not use fast matrix multiplication.

FOCS Conference 2020 Conference Paper

Unit Capacity Maxflow in Almost $O(m^{4/3})$ Time

  • Tarun Kathuria
  • Yang P. Liu
  • Aaron Sidford

We present an algorithm, which given any m-edge n-vertex directed graph with positive integer capacities at most U computes a maximum s-t flow for any vertices s and t in O(m 4/3+o(1) U 1/3 ) time. This improves upon the previous best running times of O(m 11/8+o(1) U 1/4 ) [1], Õ(m√nlogU) [2] and O(mn) [3] when the graph is not too dense and doesn't have large capacities. We build upon advances for sparse maxflow based on interior point methods [1], [4], [5]. Whereas these methods increase the energy of local ℓ 2 -norm minimizing electrical flows, we instead increase the Bregman divergence value of flows which minimize the Bregman divergence with respect to a weighted log barrier. This allows us to trace the central path with progress depending only on ℓ ∞ norm bounds on the congestion vector as opposed to the ℓ 4 norm, which arises in these prior works. Further, we show that smoothed ℓ 2 -ℓ p flows [6], [7] which were used to maximize energy [1] can also be used to efficiently maximize divergence, thereby yielding our desired runtimes. We believe our approach towards Bregman divergences of barriers may be of further interest.

NeurIPS Conference 2019 Conference Paper

A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport

  • Arun Jambulapati
  • Aaron Sidford
  • Kevin Tian

Optimal transportation, or computing the Wasserstein or ``earth mover's'' distance between two $n$-dimensional distributions, is a fundamental primitive which arises in many learning and statistical settings. We give an algorithm which solves the problem to additive $\epsilon$ accuracy with $\tilde{O}(1/\epsilon)$ parallel depth and $\tilde{O}\left(n^2/\epsilon\right)$ work. \cite{BlanchetJKS18, Quanrud19} obtained this runtime through reductions to positive linear programming and matrix scaling. However, these reduction-based algorithms use subroutines which may be impractical due to requiring solvers for second-order iterations (matrix scaling) or non-parallelizability (positive LP). Our methods match the previous-best work bounds by \cite{BlanchetJKS18, Quanrud19} while either improving parallelization or removing the need for linear system solves, and improve upon the previous best first-order methods running in time $\tilde{O}(\min(n^2 / \epsilon^2, n^{2. 5} / \epsilon))$ \cite{DvurechenskyGK18, LinHJ19}. We obtain our results by a primal-dual extragradient method, motivated by recent theoretical improvements to maximum flow \cite{Sherman17}.

NeurIPS Conference 2019 Conference Paper

A General Framework for Symmetric Property Estimation

  • Moses Charikar
  • Kirankumar Shiragur
  • Aaron Sidford

In this paper we provide a general framework for estimating symmetric properties of distributions from i. i. d. samples. For a broad class of symmetric properties we identify the {\em easy} region where empirical estimation works and the {\em difficult} region where more complex estimators are required. We show that by approximately computing the profile maximum likelihood (PML) distribution \cite{ADOS16} in this difficult region we obtain a symmetric property estimation framework that is sample complexity optimal for many properties in a broader parameter regime than previous universal estimation approaches based on PML. The resulting algorithms based on these \emph{pseudo PML distributions} are also more practical.

NeurIPS Conference 2019 Conference Paper

Complexity of Highly Parallel Non-Smooth Convex Optimization

  • Sebastien Bubeck
  • Qijia Jiang
  • Yin-Tat Lee
  • Yuanzhi Li
  • Aaron Sidford

A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension $d$. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel. We show that in this case gradient descent is optimal only up to $\tilde{O}(\sqrt{d})$ rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to $d^{1/3}$ rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after $\sqrt{d}$ rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization.

FOCS Conference 2019 Conference Paper

Faster Matroid Intersection

  • Deeparnab Chakrabarty
  • Yin Tat Lee
  • Aaron Sidford
  • Sahil Singla 0001
  • Sam Chiu-wai Wong

In this paper we consider the classic matroid intersection problem: given two matroids M 1 = (V, I 1 ) and M 2 = (V, I 2 ) defined over a common ground set V, compute a set S ∈ I 1 ∩ I 2 of largest possible cardinality, denoted by r. We consider this problem both in the setting where each Mi is accessed through an independence oracle, i. e. a routine which returns whether or not a set S ∈ I i in T ind time, and the setting where each Mi is accessed through a rank oracle, i. e. a routine which returns the size of the largest independent subset of S in M i in T rank time. In each setting we provide faster exact and approximate algorithms. Given an independence oracle, we provide an exact O(nr log r · T ind ) time algorithm. This improves upon previous best known running times of O(nr 1. 5 ·T ind ) due to Cunningham O(n 2 ·T ind in 1986 and + n 3 ) due to Lee, Sidford, and Wong in 2015. We also provide two algorithms which compute a (1- ε-approximate solution to matroid intersection running in times O(n 1. 5 /ε 1. 5 · Tind) and O((n 2 r -1 ε -2 + r 1. 5 ε -4. 5 ) · Tind), respectively. These results improve upon the O(nr/ε · T ind )time algorithm of Cunningham (noted recently by Chekuri and Quanrud). Given a rank oracle, we provide algorithms with even better dependence on n and r. We provide an O(n√r log n · T rank )time exact algorithm and an O(nε -1 log n · T rank )-time algorithm which obtains a (1 - 0)-approximation to the matroid intersection problem. The former result improves over the O(nr · T rank + n 3 )-time algorithm by Lee, Sidford, and Wong. The rank oracle is of particular interest as the matroid intersection problem with this oracle is a special case (via Edmond's minimax characterization of matroid intersection) of the submodular function minimization (SFM) problem with an evaluation oracle, and understanding SFM query complexity is an outstanding open question.

STOC Conference 2019 Conference Paper

Memory-sample tradeoffs for linear regression with small error

  • Vatsal Sharan
  • Aaron Sidford
  • Gregory Valiant

We consider the problem of performing linear regression over a stream of d -dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples ( a 1 , b 1 ), ( a 2 , b 2 )…, with a i drawn independently from a d -dimensional isotropic Gaussian, and where b i = ⟨ a i , x ⟩ + η i , for a fixed x ∈ ℝ d with || x || 2 = 1 and with independent noise η i drawn uniformly from the interval [−2 − d /5 ,2 − d /5 ]. We show that any algorithm with at most d 2 /4 bits of memory requires at least Ω( d loglog1/є) samples to approximate x to ℓ 2 error є with probability of success at least 2/3, for є sufficiently small as a function of d . In contrast, for such є, x can be recovered to error є with probability 1− o (1) with memory O ( d 2 log(1/є)) using d examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.

FOCS Conference 2019 Conference Paper

Parallel Reachability in Almost Linear Work and Square Root Depth

  • Yang P. Liu
  • Arun Jambulapati
  • Aaron Sidford

In this paper we provide a parallel algorithm that given any n-node m-edge directed graph and source vertex s computes all vertices reachable from s with Õ(m) work and n {1/2 + o(1) } depth with high probability in n. This algorithm also computes a set of Õ(n) edges which when added to the graph preserves reachability and ensures that the diameter of the resulting graph is at most n {1/2 + o(1) }. Our result improves upon the previous best known almost linear work reachability algorithm due to Fineman [1] which had depth Õ(n 2/3 ). Further, we show how to leverage this algorithm to achieve improved distributed algorithms for single source reachability in the CONGEST model. In particular, we provide a distributed algorithm that given a n-node digraph of undirected hop-diameter D solves the single source reachability problem with Õ(n 1/2 + n 1/3+o(1) D 2/3 ) rounds of the communication in the CONGEST model with high probability in n. Our algorithm is nearly optimal whenever D = O(n 1/4-ε ) for any constant ε > 0 and is the first nearly optimal algorithm for general graphs whose diameter is Ω(n δ ) for any constant δ.

NeurIPS Conference 2019 Conference Paper

Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG

  • Yujia Jin
  • Aaron Sidford

Given a n-by-d data matrix A, principal component projection (PCP) and principal component regression (PCR), i. e. projection and regression restricted to the top-eigenspace of A, are fundamental problems in machine learning, optimization, and numerical analysis. In this paper we provide the first algorithms that solve these problems in nearly linear time for fixed eigenvalue distribution and large n. This improves upon previous methods which had superlinear running times when either the number of top eigenvalues or gap between the eigenspaces were large. We achieve our results by applying rational polynomial approximations to reduce the problem to solving asymmetric linear systems which we solve by a variant of SVRG. We corroborate these findings with preliminary empirical experiments.

NeurIPS Conference 2019 Conference Paper

Variance Reduction for Matrix Games

  • Yair Carmon
  • Yujia Jin
  • Aaron Sidford
  • Kevin Tian

We present a randomized primal-dual algorithm that solves the problem min x max y y^T A x to additive error epsilon in time nnz(A) + sqrt{nnz(A) n} / epsilon, for matrix A with larger dimension n and nnz(A) nonzero entries. This improves the best known exact gradient methods by a factor of sqrt{nnz(A) / n} and is faster than fully stochastic gradient methods in the accurate and/or sparse regime epsilon < sqrt{n / nnz(A)$. Our results hold for x, y in the simplex (matrix games, linear programming) and for x in an \ell_2 ball and y in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines the Nemirovski's "conceptual prox-method" and a novel reduced-variance gradient estimator based on "sampling from the difference" between the current iterate and a reference point.

SODA Conference 2018 Conference Paper

Approximating Cycles in Directed Graphs: Fast Algorithms for Girth and Roundtrip Spanners

  • Jakub Pachocki
  • Liam Roditty
  • Aaron Sidford
  • Roei Tov
  • Virginia Vassilevska Williams

The girth of a graph, i. e. the length of its shortest cycle, is a fundamental graph parameter. Unfortunately all known algorithms for computing, even approximately, the girth and girth-related structures in directed weighted m -edge and n -node graphs require Ω(min{ n ω, mn }) time (for 2 ≤ ω < 2. 373). In this paper, we drastically improve these runtimes as follows: • Multiplicative Approximations in Nearly Linear Time: We give an algorithm that in Õ ( m ) time computes an Õ (1)-multiplicative approximation of the girth as well as an Õ (1)-multiplicative roundtrip spanner with Õ ( n ) edges with high probability (w. h. p). • Nearly Tight Additive Approximations: For unweighted graphs and any a ∊ (0, 1) we give an algorithm that in Õ ( mn 1– a ) time computes an O ( n a )-additive approximation of the girth, w. h. p. We show that the runtime of our algorithm cannot be significantly improved without a breakthrough in combinatorial boolean matrix multiplication. We also show that if the girth is O ( n a ), then the same guarantee can be achieved via a deterministic algorithm. Our main technical contribution to achieve these results is the first nearly linear time algorithm for computing roundtrip covers, a directed graph decomposition concept key to previous roundtrip spanner constructions. Previously it was not known how to compute these significantly faster than Ω( mn ) time. Given the traditional difficulty in efficiently processing directed graphs, we hope our techniques may find further applications.

FOCS Conference 2018 Conference Paper

Coordinate Methods for Accelerating ℓ∞ Regression and Faster Approximate Maximum Flow

  • Aaron Sidford
  • Kevin Tian

In this paper we provide faster algorithms for approximately solving ℓ ∞ regression, a fundamental problem prevalent in both combinatorial and continuous optimization. In particular we provide an accelerated coordinate descent method which converges in k iterations at a O(1/k) rate independent of the dimension of the problem, and whose iterations can be implemented cheaply for many structured matrices. Our algorithm can be viewed as an alternative approach to the recent breakthrough result of Sherman [She17] which achieves a similar running time improvement over classic algorithmic approaches, i. e. smoothing and gradient descent, which either converge at a O(1/√k) rate or have running times with a worse dependence on problem parameters. Our running times match those of [She17] across a broad range of parameters and in certain cases, improves upon it. We demonstrate the efficacy of our result by providing faster algorithms for the well-studied maximum flow problem. We show how to leverage our algorithm to achieve a runtime of Õ(m + √ns/ε) to compute an ε-approximate maximum flow, for an undirected graph with m edges, n vertices, and where s is the squared ℓ 2 norm of the congestion of any optimal flow. As s = O(m) this yields a running time of Õ(m + √nm/ε), generically improving upon the previous best known runtime of Õ(m/ε) in [She17] whenever the graph is slightly dense. Moreover, we show how to leverage this result to achieve improved exact algorithms for maximum flow on a variety of unit capacity graphs. We achieve these results by providing an accelerated coordinate descent method capable of provably exploiting dynamic measures of coordinate smoothness for smoothed versions of ℓ ∞ regression. Our analysis leverages the structure of the Hessian of the smoothed problem via a simple bound on its trace, as well as techniques for exploiting column sparsity of the constraint matrix for faster sampling and improved smoothness estimates. We hope that the work of this paper can serve as an important step towards achieving even faster maximum flow algorithms.

SODA Conference 2018 Conference Paper

Efficient Õ ( n / ∊ ) Spectral Sketches for the Laplacian and its Pseudoinverse

  • Arun Jambulapati
  • Aaron Sidford

In this paper we consider the problem of efficiently computing ∊ -sketches for the Laplacian and its pseudoinverse. Given a Laplacian and an error tolerance ∊, we seek to construct a function f such that for any vector x (chosen obliviously from f ), with high probability (1 – ∊ ) x ⊤ Ax ≤ f ( x ) ≤ (1 + ∊ ) x ⊤ Ax where A is either the Laplacian or its pseudoinverse. Our goal is to construct such a sketch f efficiently and to store it in the least space possible. We provide nearly-linear time algorithms that, when given a Laplacian matrix ℒ ∊ ℝ n × n and an error tolerance ∊, produce Õ ( n / ∊ )-size sketches of both ℒ and its pseudoinverse. Our algorithms improve upon the previous best sketch size of Õ ( n/∊ 1. 6 ) for sketching the Laplacian form by [1] and O ( n / ∊ 2 ) for sketching the Laplacian pseudoinverse by [2]. Furthermore we show how to compute all-pairs effective resistances from our Õ ( n / ∊ ) size sketch in Õ ( n 2 / ∊ ) time. This improves upon the previous best running time of Õ( n 2 / ∊ 2 ) by [3].

NeurIPS Conference 2018 Conference Paper

Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression

  • Neha Gupta
  • Aaron Sidford

In this paper, we obtain improved running times for regression and top eigenvector computation for numerically sparse matrices. Given a data matrix $\mat{A} \in \R^{n \times d}$ where every row $a \in \R^d$ has $\|a\|_2^2 \leq L$ and numerical sparsity $\leq s$, i. e. $\|a\|_1^2 / \|a\|_2^2 \leq s$, we provide faster algorithms for these problems for many parameter settings. For top eigenvector computation, when $\gap > 0$ is the relative gap between the top two eigenvectors of $\mat{A}^\top \mat{A}$ and $r$ is the stable rank of $\mat{A}$ we obtain a running time of $\otilde(nd + r(s + \sqrt{r s}) / \gap^2)$ improving upon the previous best unaccelerated running time of $O(nd + r d / \gap^2)$. As $r \leq d$ and $s \leq d$ our algorithm everywhere improves or matches the previous bounds for all parameter settings. For regression, when $\mu > 0$ is the smallest eigenvalue of $\mat{A}^\top \mat{A}$ we obtain a running time of $\otilde(nd + (nL / \mu) \sqrt{s nL / \mu})$ improving upon the previous best unaccelerated running time of $\otilde(nd + n L d / \mu)$. This result expands when regression can be solved in nearly linear time from when $L/\mu = \otilde(1)$ to when $L / \mu = \otilde(d^{2/3} / (sn)^{1/3})$. Furthermore, we obtain similar improvements even when row norms and numerical sparsities are non-uniform and we show how to achieve even faster running times by accelerating using approximate proximal point \cite{frostig2015regularizing} / catalyst \cite{lin2015universal}. Our running times depend only on the size of the input and natural numerical measures of the matrix, i. e. eigenvalues and $\ell_p$ norms, making progress on a key open problem regarding optimal running times for efficient large-scale learning.

NeurIPS Conference 2018 Conference Paper

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

  • Aaron Sidford
  • Mengdi Wang
  • Xian Wu
  • Lin Yang
  • Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time. Given such a DMDP with states $\states$, actions $\actions$, discount factor $\gamma\in(0, 1)$, and rewards in range $[0, 1]$ we provide an algorithm which computes an $\epsilon$-optimal policy with probability $1 - \delta$ where {\it both} the run time spent and number of sample taken is upper bounded by \[ O\left[\frac{|\cS||\cA|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|\cS||\cA|}{(1-\gamma)\delta \epsilon} \right) \log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~. \] For fixed values of $\epsilon$, this improves upon the previous best known bounds by a factor of $(1 - \gamma)^{-1}$ and matches the sample complexity lower bounds proved in \cite{azar2013minimax} up to logarithmic factors. We also extend our method to computing $\epsilon$-optimal policies for finite-horizon MDP with a generative model and provide a nearly matching sample complexity lower bound.

JMLR Journal 2018 Journal Article

Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification

  • Prateek Jain
  • Sham M. Kakade
  • Rahul Kidambi
  • Praneeth Netrapalli
  • Aaron Sidford

This work characterizes the benefits of averaging techniques widely used in conjunction with stochastic gradient descent (SGD). In particular, this work presents a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of a stochastic gradient estimate and for parallelizing SGD and (2) tail- averaging, a method involving averaging the final few iterates of SGD in order to decrease the variance in SGD's final iterate. This work presents sharp finite sample generalization error bounds for these schemes for the stochastic approximation problem of least squares regression. Furthermore, this work establishes a precise problem-dependent extent to which mini- batching can be used to yield provable near-linear parallelization speedups over SGD with batch size one. This characterization is used to understand the relationship between learning rate versus batch size when considering the excess risk of the final iterate of an SGD procedure. Next, this mini- batching characterization is utilized in providing a highly parallelizable SGD method that achieves the minimax risk with nearly the same number of serial updates as batch gradient descent, improving significantly over existing SGD-style methods. Following this, a non-asymptotic excess risk bound for model averaging (which is a communication efficient parallelization scheme) is provided. Finally, this work sheds light on fundamental differences in SGD's behavior when dealing with mis-specified models in the non-realizable least squares problem. This paper shows that maximal stepsizes ensuring minimax risk for the mis-specified case must depend on the noise properties. The analysis tools used by this paper generalize the operator view of averaged SGD (Défossez and Bach, 2015) followed by developing a novel analysis in bounding these operators to characterize the generalization error. These techniques are of broader interest in analyzing various computational aspects of stochastic approximation. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

FOCS Conference 2018 Conference Paper

Solving Directed Laplacian Systems in Nearly-Linear Time through Sparse LU Factorizations

  • Michael B. Cohen
  • Jonathan A. Kelner
  • Rasmus Kyng
  • John Peebles
  • Richard Peng
  • Anup B. Rao
  • Aaron Sidford

In this paper, we show how to solve directed Laplacian systems in nearly-linear time. Given a linear system in an n × n Eulerian directed Laplacian with m nonzero entries, we show how to compute an ε-approximate solution in time O(m log O(1) (n) log (1/ε)). Through reductions from [Cohen et al. FOCS'16], this gives the first nearly-linear time algorithms for computing ε-approximate solutions to row or column diagonally dominant linear systems (including arbitrary directed Laplacians) and computing ε-approximations to various properties of random walks on directed graphs, including stationary distributions, personalized PageRank vectors, hitting times, and escape probabilities. These bounds improve upon the recent almost-linear algorithms of [Cohen et al. STOC'17], which gave an algorithm to solve Eulerian Laplacian systems in time O((m+n2 O(√ log n log log n) )log O(1) (n ε -1 )). To achieve our results, we provide a structural result that we believe is of independent interest. We show that Eulerian Laplacians (and therefore the Laplacians of all strongly connected directed graphs) have sparse approximate LU-factorizations. That is, for every such directed Laplacian there are lower upper triangular matrices each with at most Õ(n) nonzero entries such that there product spectrally approximates the directed Laplacian in an appropriate norm. This claim can be viewed as an analog of recent work on sparse Cholesky factorizations of Laplacians of undirected graphs. We show how to construct such factorizations in nearly-linear time and prove that once constructed they yield nearly-linear time algorithms for solving directed Laplacian systems.

SODA Conference 2018 Conference Paper

Stability of the Lanczos Method for Matrix Function Approximation

  • Cameron Musco
  • Christopher Musco
  • Aaron Sidford

Theoretically elegant and ubiquitous in practice, the Lanczos method can approximate f ( A ) x for any symmetric matrix A ∊ ℝ n × n, vector x ∊ ℝ n, and function f. In exact arithmetic, the method's error after k iterations is bounded by the error of the best degree- k polynomial uniformly approximating the scalar function f ( x ) on the range [λ min ( A ), λ max ( A )]. However, despite decades of work, it has been unclear if this powerful guarantee holds in finite precision. We resolve this problem, proving that when, Lanczos essentially matches the exact arithmetic guarantee if computations use roughly log( nC ║ A ║) bits of precision. Our proof extends work of Druskin and Knizhnerman [11], leveraging the stability of the classic Chebyshev recurrence to bound the stability of any polynomial approximating f ( x ). We also study the special case of f ( A ) = A –1 for positive definite A, where stronger guarantees hold for Lanczos. In exact arithmetic the algorithm performs as well as the best polynomial approximating 1/ x at each of A 's eigenvalues, rather than on the full range [λ min ( A ), λ max ( A )]. In seminal work, Greenbaum gives a natural approach to extending this bound to finite precision: she proves that finite precision Lanczos and the related conjugate gradient method match any polynomial approximating 1/ x in a tiny range around each eigenvalue [17]. For A –1, Greenbaum's bound appears stronger than our result. However, we exhibit matrices with condition number κ where exact arithmetic Lanczos converges in polylog( k ) iterations, but Greenbaum's bound predicts at best Ω( κ 1/5 ) iterations in finite precision. It thus cannot offer more than a polynomial improvement over the O ( κ 1/2 ) bound achievable via our result for general f ( A ). Our analysis bounds the power of stable approximating polynomials and raises the question of if they fully characterize the behavior of finite precision Lanczos in solving linear systems. If they do, convergence in less than poly( K ) iterations cannot be expected, even for matrices with clustered, skewed, or otherwise favorable eigenvalue distributions.

SODA Conference 2018 Conference Paper

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes

  • Aaron Sidford
  • Mengdi Wang 0001
  • Xian Wu 0009
  • Yinyu Ye 0001

In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with | S | states, | A | actions, discount factor γ ∊ (0, 1), and rewards in the range [ –M, M ], we show how to compute an ∊ -optimal policy, with probability 1 – δ in time This contribution reflects the first nearly linear time, nearly linearly convergent algorithm for solving DMDP's for intermediate values of γ. We also show how to obtain improved sublinear time algorithms and provide an algorithm which computes an ∊ -optimal policy with probability 1 – δ in time provided we can sample from the transition function in O (1) time. Interestingly, we obtain our results by a careful modification of approximate value iteration. We show how to combine classic approximate value iteration analysis with new techniques in variance reduction. Our fastest algorithms leverage further insights to ensure that our algorithms make monotonic progress towards the optimal value. This paper is one of few instances in using sampling to obtain a linearly convergent linear programming algorithm and we hope that the analysis may be useful more broadly.

ICML Conference 2017 Conference Paper

"Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

  • Yair Carmon
  • John C. Duchi
  • Oliver Hinder
  • Aaron Sidford

We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is “guilty” of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a function $f$ with Lipschitz continuous gradient and Hessian, we compute a point $x$ with $\|\nabla f(x)\| \le \epsilon$ in $O(\epsilon^{-7/4} \log(1/ \epsilon) )$ gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only $O(\epsilon^{-5/3} \log(1/ \epsilon) )$ evaluations.

FOCS Conference 2017 Conference Paper

Derandomization Beyond Connectivity: Undirected Laplacian Systems in Nearly Logarithmic Space

  • Jack Murtagh
  • Omer Reingold
  • Aaron Sidford
  • Salil P. Vadhan

We give a deterministic Õ(log n)-space algorithm for approximately solving linear systems given by Laplacians of undirected graphs, and consequently also approximating hitting times, commute times, and escape probabilities for undirected graphs. Previously, such systems were known to be solvable by randomized algorithms using O(log n) space (Doron, Le Gall, and Ta-Shma, 2017) and hence by deterministic algorithms using O(log 3/2 n) space (Saks and Zhou, FOCS 1995 and JCSS 1999). Our algorithm combines ideas from time-efficient Laplacian solvers (Spielman and Teng, STOC `04; Peng and Spielman, STOC `14) with ideas used to show that UNDIRECTED S-T CONNECTIVITY is in deterministic logspace (Reingold, STOC `05 and JACM `08; Rozenman and Vadhan, RANDOM `05).

ICML Conference 2016 Conference Paper

Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis

  • Rong Ge 0001
  • Chi Jin 0001
  • Sham M. Kakade
  • Praneeth Netrapalli
  • Aaron Sidford

This paper considers the problem of canonical-correlation analysis (CCA) and, more broadly, the generalized eigenvector problem for a pair of symmetric matrices. These are two fundamental problems in data analysis and scientific computing with numerous applications in machine learning and statistics. We provide simple iterative algorithms, with improved runtimes, for solving these problems that are globally linearly convergent with moderate dependencies on the condition numbers and eigenvalue gaps of the matrices involved. We obtain our results by reducing CCA to the top-k generalized eigenvector problem. We solve this problem through a general framework that simply requires black box access to an approximate linear system solver. Instantiating this framework with accelerated gradient descent we obtain a running time of \order\fracz k \sqrtκρ \log(1/ε) \log \left(kκ/ρ\right) where z is the total number of nonzero entries, κis the condition number and ρis the relative eigenvalue gap of the appropriate matrices. Our algorithm is linear in the input size and the number of components k up to a \log(k) factor. This is essential for handling large-scale matrices that appear in practice. To the best of our knowledge this is the first such algorithm with global linear convergence. We hope that our results prompt further research and ultimately improve the practical running time for performing these important data analysis procedures on large data sets.

FOCS Conference 2016 Conference Paper

Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More

  • Michael B. Cohen
  • Jonathan A. Kelner
  • John Peebles
  • Richard Peng
  • Aaron Sidford
  • Adrian Vladu

In this paper, we provide faster algorithms for computing variousfundamental quantities associated with random walks on a directedgraph, including the stationary distribution, personalized PageRankvectors, hitting times, and escape probabilities. In particular, ona directed graph with n vertices and m edges, we show how tocompute each quantity in time Õ(m3/4n + mn2/3), wherethe Õ notation suppresses polylog factors in n, the desired accuracy, and the appropriate condition number (i. e. themixing time or restart probability). Our result improves upon the previous fastest running times for these problems, previous results either invoke a general purpose linearsystem solver on a n × n matrix with m non-zero entries, or depend polynomially on the desired error or natural condition numberassociated with the problem (i. e. the mixing time or restart probability). For sparse graphs, we obtain a running time of Õ(n7/4), breaking the O(n2) barrier of the best running time one couldhope to achieve using fast matrix multiplication. We achieve our result by providing a similar running time improvementfor solving directed Laplacian systems, a natural directedor asymmetric analog of the well studied symmetric or undirected Laplaciansystems. We show how to solve such systems in time Õ(m3/4n + mn2/3), and efficiently reduce a broad range of problems to solving Õ(1) directed Laplacian systems on Eulerian graphs. We hope these resultsand our analysis open the door for further study into directedspectral graph theory.

ICML Conference 2016 Conference Paper

Faster Eigenvector Computation via Shift-and-Invert Preconditioning

  • Dan Garber
  • Elad Hazan
  • Chi Jin 0001
  • Sham M. Kakade
  • Cameron Musco
  • Praneeth Netrapalli
  • Aaron Sidford

We give faster algorithms and improved sample complexities for the fundamental problem of estimating the top eigenvector. Given an explicit matrix $A \in \mathbb{R}^{n \times d}$, we show how to compute an $\epsilon$-approximate top eigenvector of $A^TA$ in time $\tilde O\left( \left[\text{nnz}(A) + \frac{d \text{sr}(A)}{\text{gap}^2} \right] \cdot \log 1/\epsilon\right)$. Here $\text{nnz}(A)$ is the number of nonzeros in $A$, $\text{sr}(A)$ is the stable rank, and gap is the relative eigengap. We also consider an online setting in which, given a stream of i. i. d. samples from a distribution D with covariance matrix $\Sigma$ and a vector $x_0$ which is an $O(\text{gap})$ approximate top eigenvector for $\Sigma$, we show how to refine $x_0$ to an $\epsilon$ approximation using $O \left( \frac{\text{var}(\mathcal{D})}{\text{gap}-\epsilon}\right)$ samples from $\mathcal{D}$. Here $\text{var}(\mathcal{D})$ is a natural notion of variance. Combining our algorithm with previous work to initialize $x_0$, we obtain improved sample complexities and runtimes under a variety of assumptions on D. We achieve our results via a robust analysis of the classic shift-and-invert preconditioning method. This technique lets us reduce eigenvector computation to approximately solving a series of linear systems with fast stochastic gradient methods.

ICML Conference 2016 Conference Paper

Principal Component Projection Without Principal Component Analysis

  • Roy Frostig
  • Cameron Musco
  • Christopher Musco
  • Aaron Sidford

We show how to efficiently project a vector onto the top principal components of a matrix, *without explicitly computing these components*. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any black-box routine for ridge regression. By avoiding explicit principal component analysis (PCA), our algorithm is the first with no runtime dependence on the number of top principal components. We show that it can be used to give a fast iterative method for the popular principal component regression problem, giving the first major runtime improvement over the naive method of combining PCA with regression. To achieve our results, we first observe that ridge regression can be used to obtain a "smooth projection" onto the top principal components. We then sharpen this approximation to true projection using a low-degree polynomial approximation to the matrix step function. Step function approximation is a topic of long-term interest in scientific computing. We extend prior theory by constructing polynomials with simple iterative structure and rigorously analyzing their behavior under limited precision.

FOCS Conference 2015 Conference Paper

A Faster Cutting Plane Method and its Implications for Combinatorial and Convex Optimization

  • Yin Tat Lee
  • Aaron Sidford
  • Sam Chiu-wai Wong

In this paper we improve upon the running time for finding a point in a convex set given a separation oracle. In particular, given a separation oracle for a convex set K ⊂ R n that is contained in a box of radius R we show how to either compute a point in K or prove that K does not contain a ball of radius ϵ using an expected O(n log(nR/ϵ)) evaluations of the oracle and additional time O(n 3 log O(1) (nR/ϵ)). This matches the oracle complexity and improves upon the O(n ω+1 log(nR/ϵ)) additional time of the previous fastest algorithm achieved over 25 years ago by Vaidya [91] for the current value of the matrix multiplication constant w 2 log nM · EO + n 3 log O(1) nM) and O(n 3 log 2 n · EO + n 4 log O(1) n), improving upon the previous best of O((n 4 · EO + n 5 )logM) and O(n 5 · EO + n 6 ) respectively. · Submodular Flow: n = |V|, m = |E|, C is the maximum edge cost in absolute value and U is maximum edge capacity in absolute value. We obtain a faster weakly polynomial running time of O(n 2 log nCU · EO + n 3 logO(1) nCU), improving upon the previous best of O(mn 5 log nU · EO) and O (n 4 h min {log C, log U}) from 15 years ago by a factor of Õ(n 4 ). We also achieve faster strongly polynomial time algorithms as a consequence of our result on submodular minimization. · Matroid Intersection: n is the size of the ground set, r is the maximum size of independent sets, M is the maximum absolute value of element weight, T rank and T ind are the time for each rank and independence oracle query. We obtain a running time of O((nr log 2 nT rank +n 3 log O(1) n) log nM) and O((n 2 log nT ind +n 3 log O(1) n) log nM), achieving the first quadratic bound on the query complexity for the independence and rank oracles. In the unweighted case, this is the first improvement since 1986 for independence oracle. · Semidefinite Programming: n is the number of constraints, m is the number of dimensions and S is the total number of non-zeros in the constraint matrices. We obtain a running time of O(n(n 2 + m ω + S)), improving upon the previous best of Õ(n(n ω + m ω + S)) for the regime S is small.

FOCS Conference 2015 Conference Paper

Efficient Inverse Maintenance and Faster Algorithms for Linear Programming

  • Yin Tat Lee
  • Aaron Sidford

In this paper, we consider the following inverse maintenance problem: given A ∈ R n×d and a number of rounds r, at round k, we receive a n x n diagonal matrix D (k) and we wish to maintain an efficient linear system solver for A T D (k) A under the assumption D (k) does not change too rapidly. This inverse maintenance problem is the computational bottleneck in solving multiple optimization problems. We show how to solve this problem with Õ (nnz(A) + d ω ) preprocessing time and amortized Õ(nnz(A) + d 2 ) time per round, improving upon previous running times. Consequently, we obtain the fastest known running times for solving multiple problems including, linear programming and computing a rounding of a polytope. In particular given a feasible point in a linear program with n variables, d constraints, and constraint matrix A ∈ R d×n, we show how to solve the linear program in time Õ((nnz(A) + d 2 )√ d log(∈ -1 )). We achieve our results through a novel combination of classic numerical techniques of low rank update, preconditioning, and fast matrix multiplication as well as recent work on subspace embeddings and spectral sparsification that we hope will be of independent interest.

ICML Conference 2015 Conference Paper

Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization

  • Roy Frostig
  • Rong Ge 0001
  • Sham M. Kakade
  • Aaron Sidford

We develop a family of accelerated stochastic algorithms that optimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear least-squares regression, across a wide range of problem settings. To achieve this, we establish a framework, based on the classical proximal point algorithm, useful for accelerating recent fast stochastic algorithms in a black-box fashion. Empirically, we demonstrate that the resulting algorithms exhibit notions of stability that are advantageous in practice. Both in theory and in practice, the provided algorithms reap the computational benefits of adding a large strongly convex regularization term, without incurring a corresponding bias to the original ERM problem.

FOCS Conference 2014 Conference Paper

Path Finding Methods for Linear Programming: Solving Linear Programs in Õ(vrank) Iterations and Faster Algorithms for Maximum Flow

  • Yin Tat Lee
  • Aaron Sidford

In this paper, we present a new algorithm for '/ solving linear programs that requires only Õ(√rank(A)L) iterations where A is the constraint matrix of a linear program with m constraints, n variables, and bit complexity L. Each iteration of our method consists of solving Õ(1) linear systems and additional nearly linear time computation. Our method improves upon the previous best iteration bounds by factor Ω̃((m/rank (A))) 1/4 ) of for methods with polynomial time computable iterations and by Ω̃((m/rank (A)) 1/2 ) for methods which solve at most Õ(1) linear systems in each iteration each achieved over 20 years ago. Applying our techniques to the linear program formulation of maximum flow yields an Õ(|E| √|V| log 2 U) time algorithm for solving the maximum flow problem on directed graphs with |E| edges, |V| vertices, and capacity ratio U. This improves upon the previous fastest running time of O(|E| min{|E| 1/2, |V| 2/3 } log (|V| 2 /|E|) log(U)) achieved over 15 years ago by Goldberg and Rao and improves upon the previous best running times for solving dense directed unit capacity graphs of Õ(|E| min{|E| 1/2, |V| 2/3 }) achieved by Even and Tarjan over 35 years ago and a running time of Õ(|E| 10/7 ) achieved recently by Madry.

FOCS Conference 2014 Conference Paper

Single Pass Spectral Sparsification in Dynamic Streams

  • Michael Kapralov
  • Yin Tat Lee
  • Cameron Musco
  • Christopher Musco
  • Aaron Sidford

We present the first single pass algorithm for computing spectral sparsifiers of graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph, G, our algorithm maintains a randomized linear sketch of the incidence matrix into dimension O(1/∈ 2n polylog(n)). Using this sketch, the algorithm can output a (1±∈) spectral sparsifier for G with high probability. While O(1/∈ 2n polylog(n)) space algorithms are known for computing cut sparsifiers in dynamic streams [1], [2] and spectral sparsifiers in insertion-only streams [3], prior to our work, the best known single pass algorithm for maintaining spectral sparsifiers in dynamic streams required sketches of dimension Ω(1/∈ 2n5/3 ). To achieve our result, we show that, using a coarse sparsifier of G and a linear sketch of G's incidence matrix, it is possible to sample edges by effective resistance, obtaining a spectral sparsifier of arbitrary precision. Sampling from the sketch requires a novel application of ℓ 2 /ℓ 2 sparse recovery, a natural extension of the ℓ 0 methods used for cut sparsifiers in [1]. Recent work of [2] on row sampling for matrix approximation gives a recursive approach for obtaining the required coarse sparsifiers. Under certain restrictions, our approach also extends to the problem of maintaining a spectral approximation for a general matrix A T A given a stream of updates to rows in A.

FOCS Conference 2013 Conference Paper

Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems

  • Yin Tat Lee
  • Aaron Sidford

In this paper we show how to accelerate randomized coordinate descent methods and achieve faster convergence rates without paying per-iteration costs in asymptotic running time. In particular, we show how to generalize and efficiently implement a method proposed by Nesterov, giving faster asymptotic running times for various algorithms that use standard coordinate descent as a black box. In addition to providing a proof of convergence for this new general method, we show that it is numerically stable, efficiently implementable, and in certain regimes, asymptotically optimal. To highlight the power of this algorithm, we show how it can used to create faster linear system solvers in several regimes: - We show how this method achieves a faster asymptotic runtime than conjugate gradient for solving a broad class of symmetric positive definite systems of equations. - We improve the convergence guarantees for Kaczmarz methods, a popular technique for image reconstruction and solving over determined systems of equations, by accelerating an algorithm of Strohmer and Vershynin. - We achieve the best known running time for solving Symmetric Diagonally Dominant (SDD) system of equations in the unit-cost RAM model, obtaining a running time of O(m log3/2n (log log n)1/2 log((log n)/eps)) by accelerating a recent solver by Kelner et al. Beyond the independent interest of these solvers, we believe they highlight the versatility of the approach of this paper and we hope that they will open the door for further algorithmic improvements in the future.