SODA Conference 2025 Conference Paper
Eulerian Graph Sparsification by Effective Resistance Decomposition
- Arun Jambulapati
- Sushant Sachdeva
- Aaron Sidford
- Kevin Tian
- Yibin Zhao 0003
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
SODA Conference 2025 Conference Paper
FOCS Conference 2025 Conference Paper
Placing a dataset A = {a i } i∈[n] ⊂ ℝ d in radial isotropic position, i. e. , finding an invertible R ∈ ℝ d×d such that the unit vectors ${\left\{ {\left( {{\text{R}}{{\text{a}}_i}} \right)\left\| {{\text{R}}{{\text{a}}_i}} \right\|_2^{ - 1}} \right\}_{i \in [n]}}$ are in isotropic position, is a powerful tool with applications in functional analysis, communication complexity, coding theory, and the design of learning algorithms. When the transformed dataset has a second moment matrix within a exp(±ϵ) factor of a multiple of I d, we call R an ϵ-approximate Forster transform. We give a faster algorithm for computing approximate Forster transforms, based on optimizing an objective defined by Barthe [1]. When the transform has a polynomially-bounded aspect ratio, our algorithm uses $O\left( {n{d^{\omega - 1}}{{\left( {\frac{n}{\varepsilon }} \right)}^{o(1)}}} \right)$ time to output an ϵ-approximate Forster transform with high probability, when one exists. This is almost the natural limit of this approach, as even evaluating Barthe’s objective takes O(nd ω−1 ) time. Previously, the state-of-the-art runtime in this regime was based on cutting-plane methods, and scaled at least as ≈ n 3 +n 2 d ω−1. We also provide explicit estimates on the aspect ratio in the smoothed analysis setting, and show that our algorithm similarly improves upon those in the literature. To obtain our results, we develop a subroutine of potential broader interest: a reduction from almost-linear time sparsification of graph Laplacians to the ability to support almost-linear time matrix-vector products. We combine this tool with new stability bounds on Barthe’s objective to implicitly implement a box-constrained Newton’s method [2], [3].
SODA Conference 2024 Conference Paper
SODA Conference 2024 Conference Paper
STOC Conference 2024 Conference Paper
We consider the sparsification of sums F : ℝ n → ℝ + where F ( x ) = f 1 (⟨ a 1 , x ⟩) + ⋯ + f m (⟨ a m , x ⟩) for vectors a 1 ,…, a m ∈ ℝ n and functions f 1 ,…, f m : ℝ → ℝ + . We show that (1+ε)-approximate sparsifiers of F with support size n /ε 2 (log n /ε) O (1) exist whenever the functions f 1 ,…, f m are symmetric, monotone, and satisfy natural growth bounds. Additionally, we give efficient algorithms to compute such a sparsifier assuming each f i can be evaluated efficiently. Our results generalize the classical case of ℓ p sparsification, where f i ( z ) = | z | p , for p ∈ (0, 2], and give the first near-linear size sparsifiers in the well-studied setting of the Huber loss function and its generalizations, e.g., f i ( z ) = min{| z | p , | z | 2 } for 0 < p ≤ 2. Our sparsification algorithm can be applied to give near-optimal reductions for optimizing a variety of generalized linear models including ℓ p regression for p ∈ (1, 2] to high accuracy, via solving (log n ) O (1) sparse regression instances with m ≤ n (log n ) O (1) , plus runtime proportional to the number of nonzero entries in the vectors a 1 , …, a m .
NeurIPS Conference 2024 Conference Paper
In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model calibration have remained relatively less well-explored. Motivated by Blasiok et al '23, which proposed a rigorous framework for measuring distances to calibration, we initiate the algorithmic study of calibration through the lens of property testing. We define the problem of calibration testing from samples where given $n$ draws from a distribution $\mathcal{D}$ on $(\text{predictions}, \text{binary outcomes})$, our goal is to distinguish between the cases where $\mathcal{D}$ is perfectly calibrated or $\epsilon$-far from calibration. We make the simple observation that the empirical smooth calibration linear program can be reformulated as an instance of minimum-cost flow on a highly-structured graph, and design an exact dynamic programming-based solver for it which runs in time $O(n\log^2(n))$, and solves the calibration testing problem information-theoretically optimally in the same time. This improves upon state-of-the-art black-box linear program solvers requiring $\Omega(n^\omega)$ time, where $\omega > 2$ is the exponent of matrix multiplication. We also develop algorithms for tolerant variants of our testing problem improving upon black-box linear program solvers, and give sample complexity lower bounds for alternative calibration measures to the one considered in this work. Finally, we present experiments showing the testing problem we define faithfully captures standard notions of calibration, and that our algorithms scale efficiently to accommodate large sample sizes.
STOC Conference 2023 Conference Paper
We present an algorithm that given any n -vertex, m -edge, rank r hypergraph constructs a spectral sparsifier with O ( n ε −2 log n log r ) hyperedges in nearly-linear O ( mr ) time. This improves in both size and efficiency over a line of work [Bansal-Svensson-Trevisan 2019, Kapralov-Krauthgamer-Tardos-Yoshida 2021] for which the previous best size was O (min{ n ε −4 log 3 n , nr 3 ε −2 log n }) and runtime was O ( mr + n O (1) ).
FOCS Conference 2023 Conference Paper
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJ+20], [ACJ+21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^{d}$, we obtain the following results (up to polylogarithmic factors). 1)We give a parallel algorithm obtaining optimization error $\epsilon_{\text {opt} }$ with $d^{1 / 3} \epsilon_{\text {opt} }^{-2 / 3}$ gradient oracle query depth and $d^{1 / 3} \epsilon_{\text {opt} }^{-2 / 3}+\epsilon_{\text {opt} }^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text {opt} } \in\left[d^{-1}, d^{-1 / 4}\right]$, our algorithm matches the state-of-the-art oracle depth of [BJL+19] while maintaining the optimal total work of stochastic gradient descent. 2)Given n samples of Lipschitz loss functions, prior works [BFTT19], [BFGT20], [AFKT21], [KLL21] established that if $n \gt rsim d \epsilon_{\mathrm{dp}}^{-2}, \left(\epsilon_{\mathrm{dp}}, \delta\right)$-differential privacy is attained at no asymptotic cost to the SCO utility. However, these prior works all required a superlinear number of gradient queries. We close this gap for sufficiently large $n \gt rsim d^{2} \epsilon_{{\bf d p}}^{-3}$, by using ReSQue to design an algorithm with near-linear gradient query complexity in this regime.
NeurIPS Conference 2023 Conference Paper
We investigate area convexity [Sherman17], a mysterious tool introduced to tackle optimization problems under the challenging $\ell_\infty$ geometry. We develop a deeper understanding of its relationship with conventional analyses of extragradient methods [Nemirovski04, Nesterov07]. We also give improved solvers for the subproblems required by variants of the [Sherman17] algorithm, designed through the lens of relative smoothness [BBT17, LFN18}. Leveraging these new tools, we give a state-of-the-art first-order algorithm for solving box-simplex games (a primal-dual formulation of $\ell_\infty$ regression) in a $d \times n$ matrix with bounded rows, using $O(\log d \cdot \epsilon^{-1})$ matrix-vector queries. As a consequence, we obtain improved complexities for approximate maximum flow, optimal transport, min-mean-cycle, and other basic combinatorial optimization problems. We also develop a near-linear time algorithm for a matrix generalization of box-simplex games, capturing a family of problems closely related to semidefinite programs recently used as subroutines in robust statistics and numerical linear algebra.
FOCS Conference 2023 Conference Paper
Abstract-For any norms $N_{1}, \ldots, N_{m}$ on $\mathbb{R}^{n}$ and $N(x): = N_{1}(x)+\cdots+N_{m}(x)$, we show there is a sparsified norm $\tilde{N}(x)= w_{1} N_{1}(x)+\cdots+w_{m} N_{m}(x)$ such that $|N(x)-\tilde{N}(x)| \leqslant \varepsilon N(x)$ for all $x \in \mathbb{R}^{n}$, where $w_{1}, \ldots, w_{m}$ are non-negative weights, of which only $O\left(\varepsilon^{-2} n \log (n / \varepsilon)(\log n)^{2. 5}\right)$ are non-zero. Additionally, we show that such weights can be found with high probability in time $O\left(m(\log n)^{O(1)}+\right. $ poly $\left. (n)\right) T$, where T is the time required to evaluate a norm $N_{i}(x)$, assuming that $N(x)$ is poly $(n)$ equivalent to the Euclidean norm. This immediately yields analogous statements for sparsifying sums of symmetric submodular functions. More generally, we show how to sparsify sums of p th powers of norms when the sum is p-uniformly smooth. 1
NeurIPS Conference 2023 Conference Paper
We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including: Diagonal preconditioning. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, computes an $\epsilon$-optimal diagonal preconditioner in time $\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star, \epsilon^{-1}))$, where $\kappa^\star$ is the optimal condition number of the rescaled matrix. Structured linear systems. We give an algorithm which, given $\mathbf{M} \in \mathbb{R}^{d \times d}$ that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in $\mathbf{M}$ in $\widetilde{O}(d^2)$ time. Our diagonal preconditioning results improve state-of-the-art runtimes of $\Omega(d^{3. 5})$ attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of $\Omega(d^{\omega})$ where $\omega > 2. 3$ is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call matrix-dictionary approximation SDPs, which we leverage to solve an associated problem we call matrix-dictionary recovery.
STOC Conference 2022 Conference Paper
STOC Conference 2022 Conference Paper
NeurIPS Conference 2022 Conference Paper
We develop a variant of the Monteiro-Svaiter (MS) acceleration framework that removes the need to solve an expensive implicit equation at every iteration. Consequently, for any $p\ge 2$ we improve the complexity of convex optimization with Lipschitz $p$th derivative by a logarithmic factor, matching a lower bound. We also introduce an MS subproblem solver that requires no knowledge of problem parameters, and implement it as either a second- or first-order method by solving linear systems or applying MinRes, respectively. On logistic regression problems our method outperforms previous accelerated second-order methods, but under-performs Newton's method; simply iterating our first-order adaptive subproblem solver is competitive with L-BFGS.
ICML Conference 2022 Conference Paper
The accelerated proximal point method (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i. e. , regularized minimization). This reduction is conceptually elegant and yields strong convergence rate guarantees. However, these rates feature an extraneous logarithmic term arising from the need to compute each proximal point to high accuracy. In this work, we propose a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions. We apply RECAPP to two canonical problems: finite-sum and max-structured minimization. For finite-sum problems, we match the best known complexity, previously obtained by carefully-designed problem-specific algorithms. For minimizing max_y f(x, y) where f is convex in x and strongly-concave in y, we improve on the best known (Catalyst-based) bound by a logarithmic factor.
SODA Conference 2022 Conference Paper
We provide Õ ( ∊ –1 )-pass semi-streaming algorithms for computing (1– ∊ )-approximate maximum cardinality matchings in bipartite graphs. Our most efficient methods are deterministic and use optimal, O ( n ), space, improving upon the space complexity of the previous state-of-the-art Õ ( ∊ –1 )-pass algorithm of [AG18]. To obtain our results we provide semi-streaming adaptations of more general continuous optimization tools. Further, we leverage these techniques to obtain improvements for streaming variants of approximate linear programming, optimal transport, exact matching, transshipment, and shortest path problems.
NeurIPS Conference 2021 Conference Paper
We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the \emph{robust gradient descent} framework of \cite{PrasadSBR20}, a first-order method using biased gradient queries, or the \emph{Sever} framework of \cite{DiakonikolasKK019}, an iterative outlier-removal method calling a stationary point finder. We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees compared to the state-of-the-art. For the general case of smooth GLMs (e. g. \ logistic regression), we show that the robust gradient descent framework of \cite{PrasadSBR20} can be \emph{accelerated}, and show our algorithm extends to optimizing the Moreau envelopes of Lipschitz GLMs (e. g. \ support vector machines), answering several open questions in the literature. For the well-studied case of robust linear regression, we present an alternative approach obtaining improved estimation rates over prior nearly-linear time algorithms. Interestingly, our algorithm starts with an identifiability proof introduced in the context of the sum-of-squares algorithm of \cite{BakshiP21}, which achieved optimal error rates while requiring large polynomial runtime and sample complexity. We reinterpret their proof within the Sever framework and obtain a dramatically faster and more sample-efficient algorithm under fewer distributional assumptions.
NeurIPS Conference 2021 Conference Paper
We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function $f$. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $\delta$, variance $O(\log(1/\delta))$, and an expected sampling cost of $O(\log(1/\delta))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau envelope of any Lipschitz convex function. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.
SODA Conference 2021 Conference Paper
In this paper we provide an O ( m loglog O (1) n log(1/ ∊ ))-expected time algorithm for solving Laplacian systems on n -node m -edge graphs, improving improving upon the previous best expected runtime of achieved by (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu 2014). To obtain this result we provide efficient constructions of ℓ p -stretch graph approximations with improved stretch and sparsity bounds. Additionally, as motivation for this work, we show that for every set of vectors in ℝ d (not just those induced by graphs) and all k > 0 there exist an ultra-sparsifiers with d – 1 + O ( d/k ) re-weighted vectors of relative condition number at most k 2. For small k, this improves upon the previous best known multiplicative factor of k · Õ (log d ), which is only known for the graph case.
NeurIPS Conference 2020 Conference Paper
Consider an oracle which takes a point x and returns the minimizer of a convex function f in an l 2 ball of radius r around x. It is straightforward to show that roughly r^{-1}\log(1/epsilon) calls to the oracle suffice to find an \epsilon-approximate minimizer of f in an l 2 unit ball. Perhaps surprisingly, this is not optimal: we design an accelerated algorithm which attains an epsilon-approximate minimizer with roughly r^{-2/3} \log(1/epsilon) oracle queries, and give a matching lower bound. Further, we implement ball optimization oracles for functions with a locally stable Hessian using a variant of Newton's method and, in certain cases, stochastic first-order methods. The resulting algorithms apply to a number of problems of practical and theoretical import, improving upon previous results for logistic and l infinity regression and achieving guarantees comparable to the state-of-the-art for l p regression.
STOC Conference 2020 Conference Paper
NeurIPS Conference 2020 Conference Paper
We develop two methods for the following fundamental statistical task: given an $\eps$-corrupted set of $n$ samples from a $d$-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a $1 - O(\eps\log\eps^{-1})$-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. As a key technical tool, we develop the first width-independent solvers for Schatten-$p$ norm packing semidefinite programs, giving a $(1 + \eps)$-approximate solution in $O(p\log(\tfrac{nd}{\eps})\eps^{-1})$ input-sparsity time iterations (where $n$, $d$ are problem dimensions).
NeurIPS Conference 2019 Conference Paper
Optimal transportation, or computing the Wasserstein or ``earth mover's'' distance between two $n$-dimensional distributions, is a fundamental primitive which arises in many learning and statistical settings. We give an algorithm which solves the problem to additive $\epsilon$ accuracy with $\tilde{O}(1/\epsilon)$ parallel depth and $\tilde{O}\left(n^2/\epsilon\right)$ work. \cite{BlanchetJKS18, Quanrud19} obtained this runtime through reductions to positive linear programming and matrix scaling. However, these reduction-based algorithms use subroutines which may be impractical due to requiring solvers for second-order iterations (matrix scaling) or non-parallelizability (positive LP). Our methods match the previous-best work bounds by \cite{BlanchetJKS18, Quanrud19} while either improving parallelization or removing the need for linear system solves, and improve upon the previous best first-order methods running in time $\tilde{O}(\min(n^2 / \epsilon^2, n^{2. 5} / \epsilon))$ \cite{DvurechenskyGK18, LinHJ19}. We obtain our results by a primal-dual extragradient method, motivated by recent theoretical improvements to maximum flow \cite{Sherman17}.
FOCS Conference 2019 Conference Paper
In this paper we provide a parallel algorithm that given any n-node m-edge directed graph and source vertex s computes all vertices reachable from s with Õ(m) work and n {1/2 + o(1) } depth with high probability in n. This algorithm also computes a set of Õ(n) edges which when added to the graph preserves reachability and ensures that the diameter of the resulting graph is at most n {1/2 + o(1) }. Our result improves upon the previous best known almost linear work reachability algorithm due to Fineman [1] which had depth Õ(n 2/3 ). Further, we show how to leverage this algorithm to achieve improved distributed algorithms for single source reachability in the CONGEST model. In particular, we provide a distributed algorithm that given a n-node digraph of undirected hop-diameter D solves the single source reachability problem with Õ(n 1/2 + n 1/3+o(1) D 2/3 ) rounds of the communication in the CONGEST model with high probability in n. Our algorithm is nearly optimal whenever D = O(n 1/4-ε ) for any constant ε > 0 and is the first nearly optimal algorithm for general graphs whose diameter is Ω(n δ ) for any constant δ.
SODA Conference 2019 Conference Paper
SODA Conference 2018 Conference Paper
In this paper we consider the problem of efficiently computing ∊ -sketches for the Laplacian and its pseudoinverse. Given a Laplacian and an error tolerance ∊, we seek to construct a function f such that for any vector x (chosen obliviously from f ), with high probability (1 – ∊ ) x ⊤ Ax ≤ f ( x ) ≤ (1 + ∊ ) x ⊤ Ax where A is either the Laplacian or its pseudoinverse. Our goal is to construct such a sketch f efficiently and to store it in the least space possible. We provide nearly-linear time algorithms that, when given a Laplacian matrix ℒ ∊ ℝ n × n and an error tolerance ∊, produce Õ ( n / ∊ )-size sketches of both ℒ and its pseudoinverse. Our algorithms improve upon the previous best sketch size of Õ ( n/∊ 1. 6 ) for sketching the Laplacian form by [1] and O ( n / ∊ 2 ) for sketching the Laplacian pseudoinverse by [2]. Furthermore we show how to compute all-pairs effective resistances from our Õ ( n / ∊ ) size sketch in Õ ( n 2 / ∊ ) time. This improves upon the previous best running time of Õ( n 2 / ∊ 2 ) by [3].