Arrow Research search

Author name cluster

Aaron Putterman

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
1 author row

Possible papers

8

STOC Conference 2025 Conference Paper

Correlation Clustering and (De)Sparsification: Graph Sketches Can Match Classical Algorithms

  • Sepehr Assadi
  • Sanjeev Khanna
  • Aaron Putterman

Correlation clustering is a widely-used approach for clustering large data sets based only on pairwise similarity information. In recent years, there has been a steady stream of better and better classical algorithms for approximating this problem. Meanwhile, another line of research has focused on porting the classical advances to various sublinear algorithm models, including semi-streaming, Massively Parallel Computation (MPC), and distributed computing. Yet, these latter works typically rely on ad-hoc approaches that do not necessarily keep up with advances in approximation ratios achieved by classical algorithms. Hence, the motivating question for our work is this: can the gains made by classical algorithms for correlation clustering be ported over to sublinear algorithms in a black-box manner ? We answer this question in the affirmative by introducing the paradigm of graph de-sparsification. A versatile approach for designing sublinear algorithms across various models is the graph (linear) sketching. It is known that one can find a cut sparsifier of a given graph—which approximately preserves cut structures—via graph sketching, and that this is sufficient information-theoretically for recovering a near-optimal correlation clustering solution. However, no efficient algorithms are known for this task as the resulting cut sparsifier is necessarily a weighted graph, and correlation clustering is known to be a distinctly harder problem on weighted graphs. Our main result is a randomized linear sketch of O ( n ) size for n -vertex graphs, from which one can recover with high probability an (α+ o (1))-approximate correlation clustering in polynomial time, where α is the best approximation ratio of any polynomial time classical algorithm for (unweighted) correlation clustering. This is proved via our new de-sparsification result: we recover in polynomial-time from some O ( n ) size linear sketch of a graph G , an unweighted , simple graph that approximately preserves the cut structure of G . In fact we show that under some mild conditions, any spectral sparsifier of a graph G can be de-sparsified into an unweighted simple graph with nearly the same spectrum. We believe the de-sparsification paradigm is interesting in its own right as a way of reducing graph complexity when weighted version of a problem is harder than its unweighted version. Finally, we use our techniques to get efficient algorithms for correlation clustering that match the performance of best classical algorithms, in a variety of different models, including dynamic streaming, MPC, and distributed communication models.

STOC Conference 2025 Conference Paper

Efficient Algorithms and New Characterizations for CSP Sparsification

  • Sanjeev Khanna
  • Aaron Putterman
  • Madhu Sudan 0001

CSP sparsification, introduced by Kogan and Krauthgamer (ITCS 2015), considers the following question: how much can an instance of a constraint satisfaction problem be sparsified (by retaining a reweighted subset of the constraints) while still roughly capturing the weight of constraints satisfied by every assignment. CSP sparsification captures as a special case several well-studied problems including graph cut-sparsification, hypergraph cut-sparsification, hypergraph XOR-sparsification, and corresponds to a general class of hypergraph sparsification problems where an arbitrary 0/1-valued splitting function is used to define the notion of cutting a hyperedge (see, for instance, Veldt-Benson-Kleinberg SIAM Review 2022). The main question here is to understand, for a given constraint predicate P :Σ r → {0,1} (where variables are assigned values in Σ), the smallest constant c such that O ( n c ) sized sparsifiers exist for every instance of a constraint satisfaction problem over P . A recent work of Khanna, Putterman and Sudan (SODA 2024) [KPS24] showed existence of near-linear size sparsifiers for new classes of CSPs. In this work, (1) we significantly extend the class of CSPs for which nearly linear-size sparsifications can be shown to exist while also extending the scope to settings with non-linear-sized sparsifications; (2) we give a polynomial-time algorithm to extract such sparsifications for all the problems we study including the first efficient sparsification algorithms for the problems studied in [KPS24]. Our results captured in item (1) lead to two new classifications: First we get a complete classification of all symmetric Boolean predicates P (i.e., on the Boolean domain Σ = {0,1}) that allow nearly-linear-size sparsifications. This classification reveals an inherent, and previously unsuspected, number-theoretic phenomenon that determines near-linear size sparsifiability. Second, we also completely classify the set of Boolean predicates P that allow non-trivial ( o ( n r )-size) sparsifications, thus answering an open question from the work of Kogan and Krauthgamer. The constructive aspect of our result is an arguably unexpected strengthening of [KPS24]. Their work roughly seemed to suggest that sparsifications can be found by solving problems related to finding the minimum distance of linear codes. These problems remain unsolved (in fact, are NP-hard) to this date and our work finds a different path to achieve poly-time sparsification, resolving an open problem from their work. As a consquence we also get the first efficient algorithms to spectrally sparsify Cayley graphs over F 2 n in time polynomial in the number of generators. Our techniques build on [KPS24] which proves the existence of nearly-linear size sparsifiers for CSPs where the unsatisfying assignments of the underlying predicate P are given by a linear equation over a finite field. Our main contributions are to extend this framework to higher-degree equations over general Abelian groups (both elements are crucial for our classification results) as well as designing polynomial-time sparsification algorithms for all problems in our framework.

FOCS Conference 2025 Conference Paper

On the Parallel Complexity of Finding a Matroid Basis

  • Sanjeev Khanna
  • Aaron Putterman
  • Junkai Song

A fundamental question in parallel computation, posed by Karp, Upfal, and Wigderson (FOCS 1985, JCSS 1988), asks: given only independence-oracle access to a matroid on n elements, how many adaptive rounds are required to find a basis using only polynomially many queries? This question generalizes, among others, the complexity of finding bases of linear spaces, partition matroids, and spanning forests in graphs. In their work, they established an upper bound of $O(\sqrt{n})$ rounds and a lower bound of $\widetilde{\Omega}\left(n^{1 / 3}\right)$ rounds for this problem, and these bounds have remained unimproved since then. In this work, we make the first progress in narrowing this gap by designing a parallel algorithm that finds a basis of an arbitrary matroid in $\tilde{O}\left(n^{7 / 15}\right)$ rounds (using polynomially many independence queries per round) with high probability, surpassing the long-standing $O(\sqrt{n})$ barrier. Our approach introduces a novel matroid decomposition technique and other structural insights that not only yield this general result but also lead to a much improved new algorithm for the class of partition matroids (which underlies the $\widetilde{\Omega}\left(n^{1 / 3}\right)$ lower bound of Karp, Upfal, and Wigderson). Specifically, we develop an $\tilde{O}\left(n^{1 / 3}\right)$-round algorithm, thereby settling the round complexity of finding a basis in partition matroids. As a further application, we also improve the parallel complexity of the classic matroid intersection problem. By plugging our basis-finding algorithm into a known algorithmic framework for matroid intersection, we obtain an $\tilde{O}\left(n^{37 / 45}\right)$ round algorithm for matroid intersection, improving upon the prior $O\left(n^{5 / 6}\right)$ bound. Collectively, these results represent the first progress on the parallel complexity of finding matroid bases in 40 years, and we believe that techniques developed here may prove useful for other problems on matroids.

FOCS Conference 2024 Conference Paper

Near-Optimal Size Linear Sketches for Hypergraph Cut Sparsifiers

  • Sanjeev Khanna
  • Aaron Putterman
  • Madhu Sudan 0001

A $(1\pm\epsilon)$ -sparsifier of a hypergraph $G(V, E)$ is a (weighted) subgraph that preserves the value of every cut to within a $(1\pm\epsilon)$ -factor. It is known that every hypergraph with $n$ vertices admits a $(1 \pm \epsilon)$ -sparsifier with $\tilde{O}(n/{\epsilon}^{2})$ hyperedges. In this work, we explore the task of building such a sparsifier by using only linear measurements (a linear sketch) over the hyperedges of $G$, and provide nearly-matching upper and lower bounds for this task. Specifically, we show that there is a randomized linear sketch of size $\tilde{O}(nr\log(m)/\epsilon^{2})$ bits which with high probability contains sufficient information to recover a $(1\pm\epsilon)$ cut-sparsifier with $\tilde{O}(n/\epsilon^{2})$ hyperedges for any hypergraph with at most $m$ edges each of which has arity bounded by $r$. This immediately gives a dynamic streaming algorithm for hypergraph cut sparsification with an identical space complexity, improving on the previous best known bound of $\tilde{O}(nr^{2}\log^{4}({m})/\epsilon^{2})$ bits of space (Guha, McGregor, and Tench, PODS 2015). We complement our algorithmic result above with a nearly-matching lower bound. We show that for every $\epsilon\in(0, 1)$, one needs $\Omega(nr\log(m/n)/\log(n))$ bits to construct a $(1\pm\epsilon)$ -sparsifier via linear sketching, thus showing that our linear sketch achieves an optimal dependence on both $r$ and $\log(m)$. The starting point for our improved algorithm is importance sampling of hyperedges based on the new notion of $k$ -cut strength introduced in the recent work of Quanrud (SODA 2024). The natural algorithm based on this concept leads to $\log m$ levels of sampling where errors can potentially accumulate, and this accounts for the polylog $(m)$ losses in the sketch size of the natural algorithm. We develop a more intricate analysis of the accumulation in error to show most levels do not contribute to the error and actual loss is only polylog $(n)$. Combining with careful preprocessing (and analysis) this enables us to get rid of all extraneous $\log m$ factors in the sketch size, but the quadratic dependence on $r$ remains. This dependence originates from use of correlated $\ell_{0}$ -samplers to recover a large number of low-strength edges in a hypergraph simultaneously by looking at neighborhoods of individual vertices. In graphs, this leads to discovery of $\Omega(n)$ edges in a single shot, whereas in hypergraphs, this may potentially only reveal $O$ ( $n$ / $r$ ) new edges, thus requiring $\Omega(r)$ rounds of recovery. To remedy this we introduce a new technique of random fingerprinting of hyperedges which effectively eliminates the correlations created by large arity hyperedges, and leads to a scheme for recovering hyperedges of low strength with an optimal dependence on $r$. Putting all these ingredients together yields our linear sketching algorithm. Our lower bound is established by a reduction from the universal relation problem in the one-way communication setting.