Arrow Research search

Author name cluster

Robert Krauthgamer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

70 papers
2 author rows

Possible papers

70

FOCS Conference 2025 Conference Paper

The Power of Recursive Embeddings for ℓp Metrics

  • Robert Krauthgamer
  • Nir Petruschka
  • Shay Sapir

Metric embedding is a powerful tool used extensively in mathematics and computer science. We devise a new method of using metric embeddings recursively, which turns out to be particularly effective in $\ell_{p}$ spaces, $p \lt 2$, yielding state-of-theart results for Lipschitz decomposition, for Nearest Neighbor Search, and for embedding into $\ell_{2}$. In a nutshell, our method composes metric embeddings by viewing them as reductions between problems, and thereby obtains a new reduction that is substantially more effective than the known reduction that employs a single embedding. We in fact apply this method recursively, oftentimes using double recursion, which further amplifies the gap from a single embedding. Index Terms-Metric Embedding, Lipschitz Decomposition, Nearest Neighbor Search, $\ell_{p}$ norm

FOCS Conference 2022 Conference Paper

Breaking the Cubic Barrier for All-Pairs Max-Flow: Gomory-Hu Tree in Nearly Quadratic Time

  • Amir Abboud
  • Robert Krauthgamer
  • Jason Li 0006
  • Debmalya Panigrahi
  • Thatchaphol Saranurak
  • Ohad Trabelsi

In 1961, Gomory and Hu showed that the All-Pairs Max-Flow problem of computing the max-flow between all $\begin{pmatrix}n\\2\end{pmatrix}$ pairs of vertices in an undirected graph can be solved using only $n-1$ calls to any (single-pair) max-flow algorithm. Even assuming a linear-time max-flow algorithm, this yields a running time of $O(mn)$, which is $O(n^{3})$ when $m=\Theta(n^{2})$. While subsequent work has improved this bound for various special graph classes, no subcubic-time algorithm has been obtained in the last 60 years for general graphs. We break this longstanding barrier by giving an $\tilde{O}(n^{2})$-time algorithm on general, integer-weighted graphs. Combined with a popular complexity assumption, we establish a counter-intuitive separation: all-pairs max-flows are strictly easier to compute than all-pairs shortest-paths. Our algorithm produces a cut-equivalent tree, known as the Gomory-Hu tree, from which the max-flow value for any pair can be retrieved in near-constant time. For unweighted graphs, we refine our techniques further to produce a Gomory-Hu tree in the time of a poly-logarithmic number of calls to any maxflow algorithm. This shows an equivalence between the all-pairs and single-pair max-flow problems, and is optimal up to polylogarithmic factors. Using the recently announced $m^{1+o(1)}$-time max-flow algorithm (Chen et al. , March 2022), our Gomory-Hu tree algorithm for unweighted graphs also runs in $m^{1+o(1)}$-time.

FOCS Conference 2022 Conference Paper

Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal

  • Elazar Goldenberg
  • Tomasz Kociumaka
  • Robert Krauthgamer
  • Barna Saha

We study the problem of approximating edit distance in sublinear time. This is formalized as the $(k, \ k^{\mathrm{c}})$-GAP EDIT DISTANCE problem, where the input is a pair of strings $X, \mathrm{Y}$ and parameters $k, c\gt 1$, and the goal is to return YES if ED(X, Y) $\leq k$, NO if ED(X, Y) $\gt k^{\mathrm{c}}$, and an arbitrary answer when $k\lt $ ED(X, Y) $\leq k^{\mathrm{c}}$. Recent years have witnessed significant interest in designing sublinear-time algorithms for GAP EDIT DISTANCE. In this work, we resolve the non-adaptive query complexity of GAP EDIT DISTANCE for the entire range of parameters, improving over a sequence of previous results. Specifically, we design a non-adaptive algorithm with query complexity $\tilde{O}(n/k^{\mathrm{c}-\mathrm{O}. 5})$, and we further prove that this bound is optimal up to polylogarithmic factors. Our algorithm also achieves optimal time complexity $\tilde{O}(n/k^{\mathrm{c}-\mathrm{O}. 5})$ whenever $ c\geq$ 1. 5. For $1 \lt c\lt $ 1. 5, the running time of our algorithm is $\tilde{O}(n/k^{2\mathrm{c}-2})$. In the restricted case of $k^{\mathrm{c}}=\Omega(n)$, this matches a known result [Batu, Ergün, Kilian, Magen, Raskhodnikova, Rubinfeld, and Sami; STOC 2003], and in all other (nontrivial) cases, our running time is strictly better than all previous algorithms, including the adaptive ones. However, independent work of Bringmann, Cassis, Fischer, and Nakos [STOC 2022] provides an adaptive algorithm that bypasses the non-adaptive lower bound, but only for small enough k and c.

FOCS Conference 2022 Conference Paper

Streaming Facility Location in High Dimension via Geometric Hashing

  • Artur Czumaj
  • Shaofeng H. -C. Jiang
  • Robert Krauthgamer
  • Pavel Veselý 0001
  • Mingwei Yang 0002

In Euclidean Uniform Facility Location, the input is a set of clients in $\mathrm{R}^{d}$ and the goal is to place facilities to serve them, so as to minimize the total cost of opening facilities plus connecting the clients. We study the classical setting of dynamic geometric streams, where the clients are presented as a sequence of insertions and deletions of points in the grid $\{1, ldots\, \Delta \}^{d}$, and we focus on the high-dimensional regime, where the algorithm’s space complexity must be polynomial (and certainly not exponential) in $d \cdot \log \Delta$. We present a new algorithmic framework, based on importance sampling from the stream, for $O(1)$-approximation of the optimal cost using only poly $(d\cdot\log\Delta)$ space. This framework is easy to implement in two passes, one for sampling points and the other for estimating their contribution. Over random-order streams, we can extend this to a one-pass algorithm by using the two halves of the stream separately. Our main result, for arbitrary-order streams, computes $O(d^{1. 5})$-approximation in one pass by using the new framework but combining the two passes differently. This improves upon previous algorithms that either need space exponential in d or only guarantee $O(d\cdot\log^{2}\Delta)$-approximation, and therefore our algorithms for high-dimensional streams are the first to avoid the $O(\log\Delta)$ factor in approximation that is inherent to the widely-used quadtree decomposition. Our improvement is achieved by employing a geometric hashing scheme that maps points in $\mathbb{R}^{d}$ into buckets of bounded diameter, with the key property that every point set of small-enough diameter is hashed into at most poly $(d)$ distinct buckets. Finally, we complement our results with a proof that every streaming 1. 085-approximation algorithm requires space exponential in poly $(d \cdot log \Delta)$, even for insertion-only streams.

FOCS Conference 2022 Conference Paper

The Power of Uniform Sampling for Coresets

  • Vladimir Braverman
  • Vincent Cohen-Addad
  • Shaofeng H. -C. Jiang
  • Robert Krauthgamer
  • Chris Schwiegelshohn
  • Mads Bech Toftrup
  • Xuan Wu 0002

Motivated by practical generalizations of the classic k-median and k-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive error. This reduction enables us to construct coresets using uniform sampling, in contrast to the widely-used importance sampling, and consequently we can easily handle constrained objectives. Notably and perhaps surprisingly, this simpler sampling scheme can yield coresets whose size is independent of n, the number of input points. Our technique yields smaller coresets, and sometimes the first coresets, for a large number of constrained clustering problems, including capacitated clustering, fair clustering, Euclidean Wasserstein barycenter, clustering in minor-excluded graph, and polygon clustering under Fréchet and Hausdorff distance. Finally, our technique yields also smaller coresets for 1-median in low-dimensional Euclidean spaces, specifically of size $\tilde{O}(\varepsilon^{-15})$ in $\mathbb{R}^{2}$ and $\tilde{O}(\varepsilon^{-16})$ in $\mathbb{R}^{3}$.

FOCS Conference 2021 Conference Paper

APMF < APSP? Gomory-Hu Tree for Unweighted Graphs in Almost-Quadratic Time

  • Amir Abboud
  • Robert Krauthgamer
  • Ohad Trabelsi

We design an $n^{2+o(1)}$ -time algorithm that constructs a cut-equivalent (Gomory-Hu) tree of a simple graph on $n$ nodes. This bound is almost-optimal in terms of $n$, and it improves on the recent $\tilde{O}(n^{2. 5})$ bound by the authors (STOC 2021), which was the first to break the cubic barrier. Consequently, the All-Pairs Maximum-Flow (APMF) problem has time complexity $n^{2+o(1)}$, and for the first time in history, this problem can be solved faster than All-Pairs Shortest Paths (APSP). We further observe that an almost-linear time algorithm (in terms of the number of edges $m$ ) is not possible without first obtaining a subcubic algorithm for multigraphs. Finally, we derandomize our algorithm, obtaining the first subcubic deterministic algorithm for Gomory-Hu Tree in simple graphs, showing that randomness is not necessary for beating the $n-1$ times max-flow bound from 1961. The upper bound is $\tilde{O}(n^{2\frac{2}{3}})$ and it would improve to $n^{2+o(1)}\ \mathbf{i}\mathbf{f}$ there is a deterministic single-pair maximum-flow algorithm that is almost-linear. The key novelty is in using a “dynamic pivot” technique instead of the randomized pivot selection that was central in recent works.

NeurIPS Conference 2021 Conference Paper

Coresets for Clustering with Missing Values

  • Vladimir Braverman
  • Shaofeng Jiang
  • Robert Krauthgamer
  • Xuan Wu

We provide the first coreset for clustering points in $\mathbb{R}^d$ that have multiple missing values (coordinates). Previous coreset constructions only allow one missing coordinate. The challenge in this setting is that objective functions, like \kMeans, are evaluated only on the set of available (non-missing) coordinates, which varies across points. Recall that an $\epsilon$-coreset of a large dataset is a small proxy, usually a reweighted subset of points, that $(1+\epsilon)$-approximates the clustering objective for every possible center set. Our coresets for $k$-Means and $k$-Median clustering have size $(jk)^{O(\min(j, k))} (\epsilon^{-1} d \log n)^2$, where $n$ is the number of data points, $d$ is the dimension and $j$ is the maximum number of missing coordinates for each data point. We further design an algorithm to construct these coresets in near-linear time, and consequently improve a recent quadratic-time PTAS for $k$-Means with missing values [Eiben et al. , SODA 2021] to near-linear time. We validate our coreset construction, which is based on importance sampling and is easy to implement, on various real data sets. Our coreset exhibits a flexible tradeoff between coreset size and accuracy, and generally outperforms the uniform-sampling baseline. Furthermore, it significantly speeds up a Lloyd's-style heuristic for $k$-Means with missing values.

FOCS Conference 2021 Conference Paper

Spectral Hypergraph Sparsifiers of Nearly Linear Size

  • Michael Kapralov
  • Robert Krauthgamer
  • Jakab Tardos
  • Yuichi Yoshida

Graph sparsification has been studied extensively over the past two decades, culminating in spectral sparsifiers of optimal size (up to constant factors). Spectral hypergraph sparsification is a natural analogue of this problem, for which optimal bounds on the sparsifier size are not known, mainly because the hypergraph Laplacian is non-linear, and thus lacks the linear-algebraic structure and tools that have been so effective for graphs. Our main contribution is the first algorithm for constructing $\epsilon$ -spectral sparsifiers for hypergraphs with $O^{\ast}(n)$ hyperedges, where $O^{\ast}$ suppresses $(\epsilon^{-1}\log n)^{O(1)}$ factors. This bound is independent of the rank $r$ (maximum cardinality of a hyperedge), and is essentially best possible due to a recent bit complexity lower bound of $\Omega(nr)$ for hypergraph sparsification. This result is obtained by introducing two new tools. First, we give a new proof of spectral concentration bounds for sparsifiers of graphs; it avoids linear-algebraic methods, replacing e. g. the usual application of the matrix Bernstein inequality and therefore applies to the (non-linear) hypergraph setting. To achieve the result, we design a new sequence of hypergraph-dependent $\epsilon$ -nets on the unit sphere in $\mathbb{R}^{n}$. Second, we extend the weight-assignment technique of Chen, Khanna and Nagda [FOCS'20] to the spectral sparsification setting. Surprisingly, the number of spanning trees after the weight assignment can serve as a potential function guiding the reweighting process in the spectral setting.

STOC Conference 2021 Conference Paper

Subcubic algorithms for Gomory-Hu tree in unweighted graphs

  • Amir Abboud
  • Robert Krauthgamer
  • Ohad Trabelsi

Every undirected graph G has a (weighted) cut-equivalent tree T , commonly named after Gomory and Hu who discovered it in 1961. Both T and G have the same node set, and for every node pair s , t , the minimum ( s , t )-cut in T is also an exact minimum ( s , t )-cut in G . We give the first subcubic-time algorithm that constructs such a tree for a simple graph G (unweighted with no parallel edges). Its time complexity is Õ( n 2.5 ), for n =| V ( G )|; previously, only Õ( n 3 ) was known, except for restricted cases like sparse graphs. Consequently, we obtain the first algorithm for All-Pairs Max-Flow in simple graphs that breaks the cubic-time barrier. Gomory and Hu compute this tree using n −1 queries to (single-pair) Max-Flow; the new algorithm can be viewed as a fine-grained reduction to Õ(√ n ) Max-Flow computations on n -node graphs.

STOC Conference 2021 Conference Paper

Towards tight bounds for spectral sparsification of hypergraphs

  • Michael Kapralov
  • Robert Krauthgamer
  • Jakab Tardos
  • Yuichi Yoshida

Cut and spectral sparsification of graphs have numerous applications, including e.g. speeding up algorithms for cuts and Laplacian solvers. These powerful notions have recently been extended to hypergraphs, which are much richer and may offer new applications. However, the current bounds on the size of hypergraph sparsifiers are not as tight as the corresponding bounds for graphs. Our first result is a polynomial-time algorithm that, given a hypergraph on n vertices with maximum hyperedge size r , outputs an є-spectral sparsifier with O * ( nr ) hyperedges, where O * suppresses (є −1 log n ) O (1) factors. This size bound improves the two previous bounds: O * ( n 3 ) [Soma and Yoshida, SODA’19] and O * ( nr 3 ) [Bansal, Svensson and Trevisan, FOCS’19]. Our main technical tool is a new method for proving concentration of the nonlinear analogue of the quadratic form of the Laplacians for hypergraph expanders. We complement this with lower bounds on the bit complexity of any compression scheme that (1+є)-approximates all the cuts in a given hypergraph, and hence also on the bit complexity of every є-cut/spectral sparsifier. These lower bounds are based on Ruzsa-Szemerédi graphs, and a particular instantiation yields an Ω( nr ) lower bound on the bit complexity even for fixed constant є. In the case of hypergraph cut sparsifiers, this is tight up to polylogarithmic factors in n , due to recent result of [Chen, Khanna and Nagda, FOCS’20]. For spectral sparsifiers it narrows the gap to O * ( r ). Finally, for directed hypergraphs, we present an algorithm that computes an є-spectral sparsifier with O * ( n 2 r 3 ) hyperarcs, where r is the maximum size of a hyperarc. For small r , this improves over O * ( n 3 ) known from [Soma and Yoshida, SODA’19], and is getting close to the trivial lower bound of Ω( n 2 ) hyperarcs.

ICML Conference 2020 Conference Paper

Coresets for Clustering in Graphs of Bounded Treewidth

  • Daniel N. Baker
  • Vladimir Braverman
  • Lingxiao Huang
  • Shaofeng H. -C. Jiang
  • Robert Krauthgamer
  • Xuan Wu 0002

We initiate the study of coresets for clustering in graph metrics, i. e. , the shortest-path metric of edge-weighted graphs. Such clustering problems are essential to data analysis and used for example in road networks and data visualization. A coreset is a compact summary of the data that approximately preserves the clustering objective for every possible center set, and it offers significant efficiency improvements in terms of running time, storage, and communication, including in streaming and distributed settings. Our main result is a near-linear time construction of a coreset for k-Median in a general graph $G$, with size $O_{\epsilon, k}(\mathrm{tw}(G))$ where $\mathrm{tw}(G)$ is the treewidth of $G$, and we complement the construction with a nearly-tight size lower bound. The construction is based on the framework of Feldman and Langberg [STOC 2011], and our main technical contribution, as required by this framework, is a uniform bound of $O(\mathrm{tw}(G))$ on the shattering dimension under any point weights. We validate our coreset on real-world road networks, and our scalable algorithm constructs tiny coresets with high accuracy, which translates to a massive speedup of existing approximation algorithms such as local search for graph k-Median.

FOCS Conference 2020 Conference Paper

Cut-Equivalent Trees are Optimal for Min-Cut Queries

  • Amir Abboud
  • Robert Krauthgamer
  • Ohad Trabelsi

Min-Cut queries are fundamental: Preprocess an undirected edge-weighted graph, to quickly report a minimum-weight cut that separates a query pair of nodes s, t. The best data structure known for this problem simply builds a cut-equivalent tree, discovered 60 years ago by Gomory and Hu, who also showed how to construct it using n-1 minimum st-cut computations. Using state-of-the-art algorithms for minimum st-cut (Lee and Sidford, FOCS 2014), one can construct the tree in time ~O(mn 3/2 ), which is also the preprocessing time of the data structure. (Throughout, we focus on polynomially-bounded edge weights, noting that faster algorithms are known for small/ u nit edge weights, and use n and m for the number of nodes and edges in the graph.) Our main result shows the following equivalence: Cut-equivalent trees can be constructed in near-linear time if and only if there is a data structure for Min-Cut queries with near-linear preprocessing time and polylogarithmic (amortized) query time, and even if the queries are restricted to a fixed source. That is, equivalent trees are an essentially optimal solution for Min-Cut queries. This equivalence holds even for every minor-closed family of graphs, such as bounded-treewidth graphs, for which a two-decade old data structure (Arikati, Chaudhuri, and Zaroliagis, J. Algorithms 1998) implies the first near-linear time construction of cut-equivalent trees. Moreover, unlike all previous techniques for constructing cut-equivalent trees, ours is robust to relying on approximation algorithms. In particular, using the almost-linear time algorithm for ( 1+ε)-approximate minimum st-cut (Kelner, Lee, Orecchia, and Sidford, SODA 2014), we can construct a ( 1+ε)-approximate flow-equivalent tree (which is a slightly weaker notion) in time n 2+o(1). This leads to the first ( 1+ε)-approximation for All-Pairs Max-Flow that runs in time n 2+o(1), and matches the output size almost-optimally.

ICML Conference 2020 Conference Paper

Schatten Norms in Matrix Streams: Hello Sparsity, Goodbye Dimension

  • Vladimir Braverman
  • Robert Krauthgamer
  • Aditya Krishnan 0001
  • Roi Sinoff

Spectral functions of large matrices contains important structural information about the underlying data, and is thus becoming increasingly important. Many times, large matrices representing real-world data are sparse or doubly sparse (i. e. , sparse in both rows and columns), and are accessed as a stream of updates, typically organized in row-order. In this setting, where space (memory) is the limiting resource, all known algorithms require space that is polynomial in the dimension of the matrix, even for sparse matrices. We address this challenge by providing the first algorithms whose space requirement is independent of the matrix dimension, assuming the matrix is doubly-sparse and presented in row-order. Our algorithms approximate the Schatten p-norms, which we use in turn to approximate other spectral functions, such as logarithm of the determinant, trace of matrix inverse, and Estrada index. We validate these theoretical performance bounds by numerical experiments on real-world matrices representing social networks. We further prove that multiple passes are unavoidable in this setting, and show extensions of our primary technique, including a trade-off between space requirements and number of passes.

ICML Conference 2019 Conference Paper

Coresets for Ordered Weighted Clustering

  • Vladimir Braverman
  • Shaofeng H. -C. Jiang
  • Robert Krauthgamer
  • Xuan Wu 0002

We design coresets for Ordered k-Median, a generalization of classical clustering problems such as k-Median and k-Center. Its objective function is defined via the Ordered Weighted Averaging (OWA) paradigm of Yager (1988), where data points are weighted according to a predefined weight vector, but in order of their contribution to the objective (distance from the centers). A powerful data-reduction technique, called a coreset, is to summarize a point set $X$ in $\mathbb{R}^d$ into a small (weighted) point set $X’$, such that for every set of $k$ potential centers, the objective value of the coreset $X’$ approximates that of $X$ within factor $1\pm \epsilon$. When there are multiple objectives (weights), the above standard coreset might have limited usefulness, whereas in a simultaneous coreset, the above approximation holds for all weights (in addition to all centers). Our main result is a construction of a simultaneous coreset of size $O_{\epsilon, d}(k^2 \log^2 |X|)$ for Ordered k-Median. We validate our algorithm on a real geographical data set, and we find our coreset leads to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights.

FOCS Conference 2019 Conference Paper

Sublinear Algorithms for Gap Edit Distance

  • Elazar Goldenberg
  • Robert Krauthgamer
  • Barna Saha

The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. A simple dynamic programming computes the edit distance between two strings of length n in O(n2) time, and a more sophisticated algorithm runs in time O(n + t2) when the edit distance is t [Landau, Myers and Schmidt, SICOMP 1998]. In pursuit of obtaining faster running time, the last couple of decades have seen a flurry of research on approximating edit distance, including polylogarithmic approximation in near-linear time [Andoni, Krauthgamer and Onak, FOCS 2010], and a constant-factor approximation in subquadratic time [Chakrabarty, Das, Goldenberg, Kouck´y and Saks, FOCS 2018]. We study sublinear-time algorithms for small edit distance, which was investigated extensively because of its numerous applications. Our main result is an algorithm for distinguishing whether the edit distance is at most t or at least t^2 (the quadratic gap problem) in time Õ(n/t+t^3). This time bound is sublinear roughly for all t in [ω(1), o(n^1/3)], which was not known before. The best previous algorithms solve this problem in sublinear time only for t=ω(n^1/3) [Andoni and Onak, STOC 2009]. Our algorithm is based on a new approach that adaptively switches between uniform sampling and reading contiguous blocks of the input strings. In contrast, all previous algorithms choose which coordinates to query non-adaptively. Moreover, it can be extended to solve the t vs t^2-ε gap problem in time Õ(n/t^1-ε+t^3).

ICML Conference 2018 Conference Paper

Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order

  • Vladimir Braverman
  • Stephen R. Chestnut
  • Robert Krauthgamer
  • Yi Li 0002
  • David P. Woodruff
  • Lin F. Yang

A central problem in mining massive data streams is characterizing which functions of an underlying frequency vector can be approximated efficiently. Given the prevalence of large scale linear algebra problems in machine learning, recently there has been considerable effort in extending this data stream problem to that of estimating functions of a matrix. This setting generalizes classical problems to the analogous ones for matrices. For example, instead of estimating frequent-item counts, we now wish to estimate “frequent-direction” counts. A related example is to estimate norms, which now correspond to estimating a vector norm on the singular values of the matrix. Despite recent efforts, the current understanding for such matrix problems is considerably weaker than that for vector problems. We study a number of aspects of estimating matrix norms in a stream that have not previously been considered: (1) multi-pass algorithms, (2) algorithms that see the underlying matrix one row at a time, and (3) time-efficient algorithms. Our multi-pass and row-order algorithms use less memory than what is provably required in the single-pass and entrywise-update models, and thus give separations between these models (in terms of memory). Moreover, all of our algorithms are considerably faster than previous ones. We also prove a number of lower bounds, and obtain for instance, a near-complete characterization of the memory required of row-order algorithms for estimating Schatten $p$-norms of sparse matrices. We complement our results with numerical experiments.

STOC Conference 2017 Conference Paper

Streaming symmetric norms via measure concentration

  • Jaroslaw Blasiok
  • Vladimir Braverman
  • Stephen R. Chestnut
  • Robert Krauthgamer
  • Lin F. Yang

We characterize the streaming space complexity of every symmetric norm l (a norm on ℝ n invariant under sign-flips and coordinate-permutations), by relating this space complexity to the measure-concentration characteristics of l . Specifically, we provide nearly matching upper and lower bounds on the space complexity of calculating a (1 ± ε)-approximation to the norm of the stream, for every 0 2. In addition, we apply our general results to easily derive bounds for several norms that were not studied before in the streaming model, including the top- k norm and the k -support norm, which was recently employed for machine learning tasks. Overall, these results make progress on two outstanding problems in the area of sublinear algorithms (Problems 5 and 30 in http://sublinear.info.

FOCS Conference 2014 Conference Paper

Spectral Approaches to Nearest Neighbor Search

  • Amirali Abdullah
  • Alexandr Andoni
  • Ravindran Kannan
  • Robert Krauthgamer

We study spectral algorithms for the high-dimensional Nearest Neighbor Search problem (NNS). In particular, we consider a semi-random setting where a dataset is chosen arbitrarily from an unknown subspace of low dimension, and then perturbed by full-dimensional Gaussian noise. We design spectral NNS algorithms whose query time depends polynomially on the dimension and logarithmically on the size of the point set. These spectral algorithms use a repeated computation of the top PCA vector/subspace, and are effective even when the random-noise magnitude is much larger than the interpoint distances. Our motivation is that in practice, a number of spectral NNS algorithms outperform the random-projection methods that seem otherwise theoretically optimal on worst-case datasets. In this paper we aim to provide theoretical justification for this disparity. The full version of this extended abstract is available on arXiv.

FOCS Conference 2012 Conference Paper

Everywhere-Sparse Spanners via Dense Subgraphs

  • Eden Chlamtac
  • Michael Dinitz
  • Robert Krauthgamer

The significant progressg in constructing graph spanners that are sparse (small number of edges) or light (low total weight) has skipped spanners that are everywhere-sparse (small maximum degree). This disparity is in line with other network design problems, where the maximum-degree objective has been a notorious technical challenge. Our main result is for the Lowest Degree $2$-Spanner (LD2S) problem, where the goal is to compute a 2-spanner of an input graph so as to minimize the maximum degree. We design a polynomial-time algorithm achieving approximation factor O(\Delta^{3-2\sqrt{2}}) \approx O(\Delta^{0. 172}), where \Delta is the maximum degree of the input graph. The previous O(\Delta^{1/4}) -- approximation was proved nearly two decades ago by Kortsarz and Peleg [SODA 1994, SICOMP 1998]. Our main conceptual contribution is to establish a formal connection between LD2S and a variant of the Densest k-Sub graph (DkS) problem. Specifically, we design for both problems strong relaxations based on the Sherali-Adams linear programming (LP) hierarchy, and show that ``faithful'' randomized rounding of the DkS-variant can be used to round LD2S solutions. Our notion of faithfulness intuitively means that all vertices and edges are chosen with probability proportional to their LP value, but the precise formulation is more subtle. Unfortunately, the best algorithms known for DkS use the Lovasz-Schrijver LP hierarchy in a non-faithful way [Bhaskara, Charikar, Chlamtac, Feige, and Vijayaraghavan, STOC 2010]. Our main technical contribution is to overcome this shortcoming, while still matching the gap that arises in random graphs by planting a sub graph with same log-density.

STOC Conference 2012 Conference Paper

The traveling salesman problem: low-dimensionality implies a polynomial time approximation scheme

  • Yair Bartal
  • Lee-Ad Gottlieb
  • Robert Krauthgamer

The Traveling Salesman Problem (TSP) is among the most famous NP-hard optimization problems. We design for this problem a randomized polynomial-time algorithm that computes a (1+µ)-approximation to the optimal tour, for any fixed µ>0, in TSP instances that form an arbitrary metric space with bounded intrinsic dimension. The celebrated results of Arora [Aro98] and Mitchell [Mit99] prove that the above result holds in the special case of TSP in a fixed-dimensional Euclidean space. Thus, our algorithm demonstrates that the algorithmic tractability of metric TSP depends on the dimensionality of the space and not on its specific geometry. This result resolves a problem that has been open since the quasi-polynomial time algorithm of Talwar [Tal04].

STOC Conference 2011 Conference Paper

Directed spanners via flow-based linear programs

  • Michael Dinitz
  • Robert Krauthgamer

We examine directed spanners through flow-based linear programming relaxations. We design an ~O(n 2/3 )-approximation algorithm for the directed k-spanner problem that works for all k ≥ 1, which is the first sublinear approximation for arbitrary edge-lengths. Even in the more restricted setting of unit edge-lengths, our algorithm improves over the previous ~O(n 1-1/k ) approximation [BGJRW09] when k ≥ 4. For the special case of k=3 we design a different algorithm achieving an ~O(√n)-approximation, improving the previous ~O(n 2/3 ) [EP05,BGJRW09] (independently of our work, an ~O(n 1-1/⌈ k/2⌉ ) was recently devised [BRR10]). Both of our algorithms easily extend to the fault-tolerant setting, which has recently attracted attention but not from an approximation viewpoint. We also prove a nearly matching integrality gap of ~Ω(n 1/3 - ε ) for every constant ε > 0. A virtue of all our algorithms is that they are relatively simple. Technically, we introduce a new yet natural flow-based relaxation, and show how to approximately solve it even when its size is not polynomial. The main challenge is to design a rounding scheme that "coordinates" the choices of flow-paths between the many demand pairs while using few edges overall. We achieve this, roughly speaking, by randomization at the level of vertices .

FOCS Conference 2011 Conference Paper

Min-max Graph Partitioning and Small Set Expansion

  • Nikhil Bansal 0001
  • Uriel Feige
  • Robert Krauthgamer
  • Konstantin Makarychev
  • Viswanath Nagarajan
  • Joseph Naor
  • Roy Schwartz 0002

We study graph partitioning problems from a min-max perspective, in which an input graph on n vertices should be partitioned into k parts, and the objective is to minimize the maximum number of edges leaving a single part. The two main versions we consider are: (i) the k parts need to be of equal size, and (ii) the parts must separate a set of k given terminals. We consider a common generalization of these two problems, and design for it an O(√log n log k)-approximation algorithm. This improves over an O(log 2 n) approximation for the second version due to Svitkina and Tardos, and roughly O(k log n) approximation for the first version that follows from other previous work. We also give an improved O(1)-approximation algorithm for graphs that exclude any fixed minor. Our algorithm uses a new procedure for solving the Small Set Expansion problem. In this problem, we are given a graph G and the goal is to find a non-empty subset S of V of size at most pn with minimum edge-expansion. We give an O(√log n log (1/p)) bicriteria approximation algorithm for the general case of Small Set Expansion and O(1) approximation algorithm for graphs that exclude any fixed minor.

FOCS Conference 2011 Conference Paper

Streaming Algorithms via Precision Sampling

  • Alexandr Andoni
  • Robert Krauthgamer
  • Krzysztof Onak

A technique introduced by Indyk and Woodruff (STOC 2005) has inspired several recent advances in data-stream algorithms. We show that a number of these results follow eas- ily from the application of a single probabilistic method called Precision Sampling. Using this method, we obtain simple data- stream algorithms that maintain a randomized sketch of an input vector x = (x 1, x 2, .. ., x n ), which is useful for the following applications: 1) Estimating the F k -moment of x, for k >; 2. 2) Estimating the ℓ p -norm of x, for p ϵ [1, 2], with small update time. 3) Estimating cascaded norms ℓp(ℓq) for all p, q >; 0. 4) ℓ 1 sampling, where the goal is to produce an element i with probability (approximately) |x i |/||x|| 1. It extends to similarly defined ℓ p -sampling, for p ϵ [1, 2]. For all these applications the algorithm is essentially the same: scale the vector x entry-wise by a well-chosen random vector, and run a heavy-hitter estimation algorithm on the resulting vector. Our sketch is a linear function of x, thereby allowing general updates to the vector x. Precision Sampling itself addresses the problem of estimating a sum Σ i=1 n a i from weak estimates of each real a i ϵ [0, 1]. More precisely, the estimator first chooses a desired precision u i ϵ (0, 1] for each i ϵ [n], and then it receives an estimate of every a i within additive u i. Its goal is to provide a good approximation to Σa i while keeping a tab on the "approximation cost" Σ i (1/u i )- Here we refine previous work (Andoni, Krauthgamer, and Onak, FOCS 2010) which shows that as long as Σa i = Ω(1), a good multiplicative approximation can be achieved using total precision of only O(n log n).

FOCS Conference 2010 Conference Paper

Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

  • Alexandr Andoni
  • Robert Krauthgamer
  • Krzysztof Onak

We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor. For strings of length n and every fixed ε >; 0, the algorithm computes a (log n) O(1/ε) approximation in n 1+ε time. This is an exponential improvement over the previously known approximation factor, 2 Õ (√log n), with a comparable running time [Ostrovsky and Rabani, J. ACM 2007; Andoni and Onak, STOC 2009]. This result arises naturally in the study of a new asymmetric query model. In this model, the input consists of two strings x and y, and an algorithm can access y in an unrestricted manner, while being charged for querying every symbol of x. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being “repetitive”, which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance.

FOCS Conference 2007 Conference Paper

The Computational Hardness of Estimating Edit Distance [Extended Abstract]

  • Alexandr Andoni
  • Robert Krauthgamer

We prove the first non-trivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. A major feature of our result is that it provides the first setting in which the complexity of computing the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a trade-off between approximation and communication, asserting, for example, thai protocols with O(1) bits of communication can only obtain approximation a ges Omega(log d/log log d), where d is the length of the input strings. This case of O(1) communication is of particular importance, since it captures constant-size sketches as well as embaddings into spaces like L 1 and squared-L 2. two prevailing algorithmic approaches for dealing with edit distance. Furthermore, the bound holds not only for strings over alphabet Sigma= {0, 1}, but also for strings that are permu-tations (called the Ulam metric). Besides being applicable to a much richer class of algorithms than all previous results, our bounds are near-tight in at. least one case, namely of embedding permutations into L 1. The proof uses a new technique, that relies on Fourier analysis in a rather elementary way.

FOCS Conference 2006 Conference Paper

Algorithms on negatively curved spaces

  • Robert Krauthgamer
  • James R. Lee

We initiate the study of approximate algorithms on negatively curved spaces. These spaces have recently become of interest in various domains of computer science including networking and vision. The classical example of such a space is the real-hyperbolic space \mathbb{H}^d for d \geqslant 2, but our approach applies to a more general family of spaces characterized by Gromov's (combinatorial) hyperbolic condition. We give efficient algorithms and data structures for problems like approximate nearest-neighbor search and compact, low-stretch routing on subsets of negatively curved spaces of fixed dimension (including \mathbb{H}^d as a special case). In a different direction, we show that there is a PTAS for the Traveling Salesman Problem when the set of cities lie, for example, in \mathbb{H}^d. This generalizes Arora's results for \mathbb{R}^d. Most of our algorithms use the intrinsic distance geometry of the data set, and only need the existence of an embedding into some negatively curved space in order to function properly. In other words, our algorithms regard the interpoint distance function as a black box, and are independent of the representation of the input points.

FOCS Conference 2004 Conference Paper

Approximating Edit Distance Efficiently

  • Ziv Bar-Yossef
  • T. S. Jayram
  • Robert Krauthgamer
  • Ravi Kumar 0001

Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than /spl lscr/, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn)/sup 2/3/ gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. /spl lscr/ gap problem, where /spl lscr/ can be as small as O(k/sup 2/) - still with a constant sketch - but works only for strings that are mildly "nonrepetitive". Finally, we develop an n/sup 3/7/-approximation quasilinear time algorithm for edit distance, improving the previous best factor of n/sup 3/4/ (Cole and Hariharan, 2002); if the input strings are assumed to be nonrepetitive, then the approximation factor can be strengthened to n/sup 1/3/.

FOCS Conference 2004 Conference Paper

Measured Descent: A New Embedding Method for Finite Metrics

  • Robert Krauthgamer
  • James R. Lee
  • Manor Mendel
  • Assaf Naor

We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Frechet embeddings for finite metrics, due to J. Bourgain and S. Rao. We prove that any n-point metric space (X, d) embeds in Hilbert space with distortion O(/spl radic//spl alpha//sub X//spl middot/log n), where /spl alpha//sub X/ is a geometric estimate on the decomposability of X. An an immediate corollary, we obtain an O(/spl radic/log /spl lambda//sub X//spl middot/log n) distortion embedding, where /spl lambda//sub X/ is the doubling constant of X. Since /spl lambda//sub X/ /spl les/ n, this result recovers Bourgain 5 theorem, but when the metric X is, in a sense, "low-dimensional", improved bounds are achieved. Our embeddings are volume-respecting for subsets of arbitrary size. One consequence is the existence of (k, O(log n)) volume-respecting embeddings for all 1 /spl les/ k /spl les/ n, which is the best possible, and answers positively a question posed by U. Feige. Our techniques are also used to answer positively a question of Y. Rabinovich, showing that any weighted n-point planar graph embeds in /spl lscr//sub /spl infin///sup O(log n)/ with O(1) distortion. The O(log n) bound on the dimension is optimal, and improves upon the previously known bound of O(log/sup 2/ n).

FOCS Conference 2003 Conference Paper

Bounded Geometries, Fractals, and Low-Distortion Embeddings

  • Anupam Gupta 0001
  • Robert Krauthgamer
  • James R. Lee

The doubling constant of a metric space (X, d) is the smallest value /spl lambda/ such that every ball in X can be covered by /spl lambda/ balls of half the radius. The doubling dimension of X is then defined as dim (X) = log/sub 2//spl lambda/. A metric (or sequence of metrics) is called doubling precisely when its doubling dimension is bounded. This is a robust class of metric spaces which contains many families of metrics that occur in applied settings. We give tight bounds for embedding doubling metrics into (low-dimensional) normed spaces. We consider both general doubling metrics, as well as more restricted families such as those arising from trees, from graphs excluding a fixed minor, and from snowflaked metrics. Our techniques include decomposition theorems for doubling metrics, and an analysis of a fractal in the plane according to T. J. Laakso (2002). Finally, we discuss some applications and point out a central open question regarding dimensionality reduction in L/sub 2/.

STOC Conference 2003 Conference Paper

The intrinsic dimensionality of graphs

  • Robert Krauthgamer
  • James R. Lee

We resolve the following conjecture raised by Levin together with Linial, London, and Rabinovich [16]. Let Z ∞ d be the infinite graph whose vertex set is Z d and which has an edge (u,v) whenever ||u-v|| ∞ = 1. Let dim(G) be the smallest d such that G occurs as a (not necessarily induced) subgraph of Z ∞ d . The growth rate of G, denoted ρ G , is the minimum ρ such that every ball of radius r > 1 in G contains at most r ρ vertices. By simple volume arguments, dim(G) = Ω(ρ G ). Levin conjectured that this lower bound is tight, i.e., that dim(G) = O(ρ G ) for every graph G.Previously, it was not known whether dim(G) could be upper bounded by any function of ρ G , even in the special case of trees. We show that a weaker form of Levin's conjecture holds by proving that, for every graph G, dim(G) = O(ρ G log ρ G ). We disprove, however, the specific bound of the conjecture and show that our upper bound is tight by exhibiting graphs for which dim(G) =Ω(ρ G log ρ G ). For families of graphs which exclude a fixed minor, we salvage the strong form, showing that dim(G) = O(ρ G ). This holds also for graphs without long induced simple cycles. Our results extend to a variant of the conjecture for finite-dimensional Euclidean spaces due to Linial[15].

STOC Conference 2001 Conference Paper

Online server allocation in a server farm via benefit task systems

  • T. S. Jayram
  • Tracy Kimbrel
  • Robert Krauthgamer
  • Baruch Schieber
  • Maxim Sviridenko

A web content hosting service provider needs to dynamically allocate servers in a server farm to its customers' web sites. Ideally, the allocation to a site should always suffice to handle its load. However, due to a limited number of servers and the overhead incurred in changing the allocation of a server from one site to another, the system may become overloaded. The problem faced by the web hosting service provider is how to allocate the available servers in the most profitable way. Adding to the complexity of this problem is the fact that future loads of the sites are either unknown or known only for the very near future. In this paper we model this server allocation problem , and consider both its offline and online versions. We give a polynomial time algorithm for computing the optimal offline allocation. In the online setting, we show almost optimal algorithms (both deterministic and randomized) for any positive lookahead. The quality of the solution improves as the lookahead increases. We also consider several special cases of practical interest. Finally, we present some experimental results using actual trace data that show that one of our online algorithm performs very close to optimal. Interestingly, the online server allocation problem can be cast as a more general benefit task system that we define. Our results extend to this task system, which captures also the benefit maximization variants of the k -server problem and the metrical task system problem. It follows that the benefit maximization variants of these problems are more tractable than their cost minimization variants.

STOC Conference 2001 Conference Paper

Private approximation of NP-hard functions

  • Shai Halevi
  • Robert Krauthgamer
  • Eyal Kushilevitz
  • Kobbi Nissim

The notion of private approximation was introduced recently by Feigenbaum, Fong, Strauss and Wright. Informally, a private approximation of a function f is another function F that approximates f in the usual sense, but does not yield any information on x other than what can be deduced from f(x) . As such, F(x) is useful for private computation of f(x) (assuming that F can be computed more efficiently than f . In this work we examine the properties and limitations of this new notion. Specifically, we show that for many NP-hard problems, the privacy requirement precludes non-trivial approximation. This is the case even for problems that otherwise admit very good approximation (e.g., problems with PTAS). On the other hand, we show that slightly relaxing the privacy requirement, by means of leaking “just a few bits of informationrdquo; about x , again permits good approximation.

FOCS Conference 2000 Conference Paper

A polylogarithmic approximation of the minimum bisection

  • Uriel Feige
  • Robert Krauthgamer

A bisection of a graph with n vertices is a partition of its vertices into two sets, each of size n/2. The bisection cost is the number of edges connecting the two sets. Finding the bisection of minimum cost is NP-hard. We present an algorithm that finds a bisection whose cost is within ratio of O(log/sup 2/ n) from the optimal. For graphs excluding any fixed graph as a minor (e. g. planar graphs) we obtain an improved approximation ratio of O(log n). The previously known approximation ratio for bisection was roughly /spl radic/n.