Arrow Research search

Author name cluster

Michael Kapralov

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

49 papers
2 author rows

Possible papers

49

FOCS Conference 2025 Conference Paper

Generalized Flow in Nearly-linear Time on Moderately Dense Graphs

  • Shunhua Jiang
  • Michael Kapralov
  • Lawrence Li
  • Aaron Sidford

In this paper we consider generalized flow problems where there is an m-edge n-node directed graph $G=(V, E)$ and each edge $e \in E$ has a loss factor $\gamma_{e}\gt 0$ governing whether the flow is increased or decreased as it crosses edge e. We provide a randomized $\widetilde{O}\left(\left(m+n^{1. 5}\right) \cdot \operatorname{polylog}\left(\frac{W}{\delta}\right)\right)$ time algorithm for solving the generalized maximum flow and generalized minimum cost flow problems in this setting where $\delta$ is the target accuracy and W is the maximum of all costs, capacities, and loss factors and their inverses. This improves upon the previous state-of-the-art $\widetilde{O}\left(m \sqrt{n} \cdot \log ^{2}\left(\frac{W}{\delta}\right)\right)$ time algorithm, obtained by combining the algorithm of [17] with techniques from [29]. To obtain this result we provide new dynamic data structures and spectral results regarding the matrices associated to generalized flows and apply them through the interior point method framework of [39]. 1. 1 The full version of this paper is available at https: //arxiv. org/abs/2510. 17740.

ICLR Conference 2025 Conference Paper

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

  • Piotr Indyk
  • Michael Kapralov
  • Kshiteej Sheth
  • Tal Wagner

Motivated by the problem of fast processing of attention matrices, we study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices $K\in \mathbb{R}^{n\times n}$. $K$'s columns are indexed by a set of $n$ keys $k_1,k_2\ldots, k_n\in \mathbb{R}^d$, rows by a set of $n$ queries $q_1,q_2,\ldots,q_n\in \mathbb{R}^d $, and its $i,j$ entry is $K_{ij} = e^{-\|q_i-k_j\|_2^2/2\sigma^2}$ for some bandwidth parameter $\sigma>0$. Given a vector $x\in \mathbb{R}^n$ and error parameter $\epsilon>0$, our task is to output a $y\in \mathbb{R}^n$ such that $\|Kx-y\|_2\leq \epsilon \|x\|_2$ in time subquadratic in $n$ and linear in $d$. Our algorithms rely on the following modelling assumption about the matrices $K$: the sum of the entries of $K$ scales linearly in $n$, as opposed to worst case quadratic growth. We validate this assumption experimentally, for Gaussian kernel matrices encountered in various settings such as fast attention computation in LLMs. Under this assumption, we obtain the first subquadratic time algorithm for kernel matrix-vector multiplication for unrestricted vectors.

NeurIPS Conference 2025 Conference Paper

Streaming Attention Approximation via Discrepancy Theory

  • Ekaterina Kochetkova
  • Kshiteej Jitesh Sheth
  • Insu Han
  • Amir Zandieh
  • Michael Kapralov

Large language models (LLMs) have achieved impressive success, but their high memory requirements present challenges for long-context token generation. In this paper we study the streaming complexity of attention approximation, a key computational primitive underlying token generation. Our main contribution is BalanceKV, a streaming algorithm for $\epsilon$-approximating attention computations based on geometric process for selecting a balanced collection of Key and Value tokens as per Banaszczyk's vector balancing theory. We complement our algorithm with space lower bounds for streaming attention computation. Besides strong theoretical guarantees, BalanceKV exhibits empirically validated performance improvements over existing methods, both for attention approximation and end-to-end performance on various long context benchmarks.

NeurIPS Conference 2024 Conference Paper

On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models

  • Aditya Bhaskara
  • Agastya V. Jha
  • Michael Kapralov
  • Naren S. Manoj
  • Davide Mazzali
  • Weronika Wrzos-Kaminska

In a graph bisection problem, we are given a graph $G$ with two equally-sized unlabeled communities, and the goal is to recover the vertices in these communities. A popular heuristic, known as spectral clustering, is to output an estimated community assignment based on the eigenvector corresponding to the second-smallest eigenvalue of the Laplacian of $G$. Spectral algorithms can be shown to provably recover the cluster structure for graphs generated from probabilistic models, such as the Stochastic Block Model (SBM). However, spectral clustering is known to be non-robust to model mis-specification. Techniques based on semidefinite programming have been shown to be more robust, but they incur significant computational overheads. In this work, we study the robustness of spectral algorithms against semirandom adversaries. Informally, a semirandom adversary is allowed to ``helpfully'' change the specification of the model in a way that is consistent with the ground-truth solution. Our semirandom adversaries in particular are allowed to add edges inside clusters or increase the probability that an edge appears inside a cluster. Semirandom adversaries are a useful tool to determine the extent to which an algorithm has overfit to statistical assumptions on the input. On the positive side, we identify a wide range of semirandom adversaries under which spectral bisection using the _unnormalized_ Laplacian is strongly consistent, i. e. , it exactly recovers the planted partitioning. On the negative side, we show that in many of these settings, _normalized_ spectral bisection outputs a partitioning that makes a classification mistake on a constant fraction of the vertices. Finally, we demonstrate numerical experiments that complement our theoretical findings.

FOCS Conference 2022 Conference Paper

Factorial Lower Bounds for (Almost) Random Order Streams

  • Ashish Chiplunkar
  • John Kallaugher
  • Michael Kapralov
  • Eric Price 0001

In this paper we introduce and study the STREAMINGCYCLES problem, a random order streaming version of the Boolean Hidden Hypermatching problem that has been instrumental in streaming lower bounds over the past decade. In this problem the edges of a graph G, comprising n/$\ell$ disjoint length-$\ell$ cycles on n vertices, are partitioned randomly among n players. Every edge is annotated with an independent uniformly random bit, and the players’ task is to output, for some cycle in G, the sum (modulo 2) of the bits on its edges, after one round of sequential communication. Our main result is an $\ell^{\Omega(\ell)}$ lower bound on the communication complexity of STREAMINGCYCLES, which is tight up to constant factors in the exponent. Applications of our lower bound for STREAMINGCYCLES include an essentially tight lower bound for component collection in (almost) random order graph streams, making progress towards a conjecture of Peng and Sohler [SODA’18] and the first exponential space lower bounds for random walk generation.

FOCS Conference 2022 Conference Paper

Motif Cut Sparsifiers

  • Michael Kapralov
  • Mikhail Makarov
  • Sandeep Silwal
  • Christian Sohler
  • Jakab Tardos

A motif is a frequently occurring subgraph of a given directed or undirected graph G (Milo et al.). Motifs capture higher order organizational structure of G beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few (Benson at al. , Tsourakakis at al.). In these applications, the cut structure of motifs plays a crucial role as vertices are partitioned into clusters by cuts whose conductance is based on the number of instances of a particular motif, as opposed to just the number of edges, crossing the cuts. In this paper, we introduce the concept of a motif cut sparsifier. We show that one can compute in polynomial time a sparse weighted subgraph $G^{\prime}$ with only $\widetilde{O}\left(n / \epsilon^{2}\right)$ edges such that for every cut, the weighted number of copies of M crossing the cut in $G^{\prime}$ is within a $1+\epsilon$ factor of the number of copies of M crossing the cut in G, for every constant size motif M. Our work carefully combines the viewpoints of both graph sparsification and hypergraph sparsification. We sample edges which requires us to extend and strengthen the concept of cut sparsifiers introduced in the seminal works of Karger and Benczúr et al. to the motif setting. The task of adapting the importance sampling framework common to efficient graph sparsification algorithms to the motif setting turns out to be nontrivial due to the fact that cut sizes in a random subgraph of G depend non-linearly on the sampled edges. To overcome this, we adopt the viewpoint of hypergraph sparsification to define edge sampling probabilities which are derived from the strong connectivity values of a hypergraph whose hyperedges represent motif instances. Finally, an iterative sparsification primitive inspired by both viewpoints is used to reduce the number of edges in G to nearly linear. In addition, we present a strong lower bound ruling out a similar result for sparsification with respect to induced occurrences of motifs 1. 1 The full version of the paper is found at https: //arxiv. org/abs/2204. 09951

NeurIPS Conference 2021 Conference Paper

Efficient and Local Parallel Random Walks

  • Michael Kapralov
  • Silvio Lattanzi
  • Navid Nouri
  • Jakab Tardos

Random walks are a fundamental primitive used in many machine learning algorithms with several applications in clustering and semi-supervised learning. Despite their relevance, the first efficient parallel algorithm to compute random walks has been introduced very recently (Łącki et al. ). Unfortunately their method has a fundamental shortcoming: their algorithm is non-local in that it heavily relies on computing random walks out of all nodes in the input graph, even though in many practical applications one is interested in computing random walks only from a small subset of nodes in the graph. In this paper, we present a new algorithm that overcomes this limitation by building random walks efficiently and locally at the same time. We show that our technique is both memory and round efficient, and in particular yields an efficient parallel local clustering algorithm. Finally, we complement our theoretical analysis with experimental results showing that our algorithm is significantly more scalable than previous approaches.

FOCS Conference 2021 Conference Paper

Spectral Hypergraph Sparsifiers of Nearly Linear Size

  • Michael Kapralov
  • Robert Krauthgamer
  • Jakab Tardos
  • Yuichi Yoshida

Graph sparsification has been studied extensively over the past two decades, culminating in spectral sparsifiers of optimal size (up to constant factors). Spectral hypergraph sparsification is a natural analogue of this problem, for which optimal bounds on the sparsifier size are not known, mainly because the hypergraph Laplacian is non-linear, and thus lacks the linear-algebraic structure and tools that have been so effective for graphs. Our main contribution is the first algorithm for constructing $\epsilon$ -spectral sparsifiers for hypergraphs with $O^{\ast}(n)$ hyperedges, where $O^{\ast}$ suppresses $(\epsilon^{-1}\log n)^{O(1)}$ factors. This bound is independent of the rank $r$ (maximum cardinality of a hyperedge), and is essentially best possible due to a recent bit complexity lower bound of $\Omega(nr)$ for hypergraph sparsification. This result is obtained by introducing two new tools. First, we give a new proof of spectral concentration bounds for sparsifiers of graphs; it avoids linear-algebraic methods, replacing e. g. the usual application of the matrix Bernstein inequality and therefore applies to the (non-linear) hypergraph setting. To achieve the result, we design a new sequence of hypergraph-dependent $\epsilon$ -nets on the unit sphere in $\mathbb{R}^{n}$. Second, we extend the weight-assignment technique of Chen, Khanna and Nagda [FOCS'20] to the spectral sparsification setting. Surprisingly, the number of spanning trees after the weight assignment can serve as a potential function guiding the reweighting process in the spectral setting.

STOC Conference 2021 Conference Paper

Towards tight bounds for spectral sparsification of hypergraphs

  • Michael Kapralov
  • Robert Krauthgamer
  • Jakab Tardos
  • Yuichi Yoshida

Cut and spectral sparsification of graphs have numerous applications, including e.g. speeding up algorithms for cuts and Laplacian solvers. These powerful notions have recently been extended to hypergraphs, which are much richer and may offer new applications. However, the current bounds on the size of hypergraph sparsifiers are not as tight as the corresponding bounds for graphs. Our first result is a polynomial-time algorithm that, given a hypergraph on n vertices with maximum hyperedge size r , outputs an є-spectral sparsifier with O * ( nr ) hyperedges, where O * suppresses (є −1 log n ) O (1) factors. This size bound improves the two previous bounds: O * ( n 3 ) [Soma and Yoshida, SODA’19] and O * ( nr 3 ) [Bansal, Svensson and Trevisan, FOCS’19]. Our main technical tool is a new method for proving concentration of the nonlinear analogue of the quadratic form of the Laplacians for hypergraph expanders. We complement this with lower bounds on the bit complexity of any compression scheme that (1+є)-approximates all the cuts in a given hypergraph, and hence also on the bit complexity of every є-cut/spectral sparsifier. These lower bounds are based on Ruzsa-Szemerédi graphs, and a particular instantiation yields an Ω( nr ) lower bound on the bit complexity even for fixed constant є. In the case of hypergraph cut sparsifiers, this is tight up to polylogarithmic factors in n , due to recent result of [Chen, Khanna and Nagda, FOCS’20]. For spectral sparsifiers it narrows the gap to O * ( r ). Finally, for directed hypergraphs, we present an algorithm that computes an є-spectral sparsifier with O * ( n 2 r 3 ) hyperarcs, where r is the maximum size of a hyperarc. For small r , this improves over O * ( n 3 ) known from [Soma and Yoshida, SODA’19], and is getting close to the trivial lower bound of Ω( n 2 ) hyperarcs.

SODA Conference 2020 Conference Paper

Differentially Private Release of Synthetic Graphs

  • Marek Eliás 0001
  • Michael Kapralov
  • Janardhan Kulkarni
  • Yin Tat Lee

We propose a ( ϵ, δ )-differentially private mechanism that, given an input graph G with n vertices and m edges, in polynomial time generates a synthetic graph G’ approximating all cuts of the input graph up to an additive error of. This is the first construction of differentially private cut approximator that allows additive error o ( m ) for all m > n log C n. The best known previous results gave additive O ( n 3/2 ) error and hence only retained information about the cut structure on very dense graphs. Thus, we are making a notable progress on a promiment problem in differential privacy. We also present lower bounds showing that our utility/privacy trade-off is essentially the best possible if one seeks to get purely additive cut approximations.

FOCS Conference 2020 Conference Paper

Kernel Density Estimation through Density Constrained Near Neighbor Search

  • Moses Charikar
  • Michael Kapralov
  • Navid Nouri
  • Paris Siminelakis

In this paper we revisit the kernel density estimation problem: given a kernel K(x, y) and a dataset of n points in high dimensional Euclidean space, prepare a data structure that can quickly output, given a query q, a (1+ ε)-approximation to μ: =[1/(|P|)]Σ p∈P K(p, q). First, we give a single data structure based on classical near neighbor search techniques that improves upon or essentially matches the query time and space complexity for all radial kernels considered in the literature so far. We then show how to improve both the query complexity and runtime by using recent advances in data-dependent near neighbor search. We achieve our results by giving an new implementation of the natural importance sampling scheme. Unlike previous approaches, our algorithm first samples the dataset uniformly (considering a geometric sequence of sampling rates), and then uses existing approximate near neighbor search techniques on the resulting smaller dataset to retrieve the sampled points that lie at an appropriate distance from the query. We show that the resulting sampled dataset has strong geometric structure, making approximate near neighbor search return the required samples much more efficiently than for worst case datasets of the same size. As an example application, we show that this approach yields a data structure that achieves query time μ -(1+0(1))/4 and space complexity μ -(1+0(1)) for the Gaussian kernel. Our data dependent approach achieves query time μ -0. 173-0(1) and space μ -(1+0(1)) for the Gaussian kernel. The data dependent analysis relies on new techniques for tracking the geometric structure of the input datasets in a recursive hashing process that we hope will be of interest in other applications in near neighbor search.

SODA Conference 2020 Conference Paper

Oblivious Sketching of High-Degree Polynomial Kernels

  • Thomas D. Ahle
  • Michael Kapralov
  • Jakob Bæk Tejs Knudsen
  • Rasmus Pagh
  • Ameya Velingker
  • David P. Woodruff
  • Amir Zandieh

Kernel methods are fundamental tools in machine learning that allow detection of non-linear dependencies between data without explicitly constructing feature vectors in high dimensional spaces. A major disadvantage of kernel methods is their poor scalability: primitives such as kernel PCA or kernel ridge regression generally take prohibitively large quadratic space and (at least) quadratic time, as kernel matrices are usually dense. Some methods for speeding up kernel linear algebra are known, but they all invariably take time exponential in either the dimension of the input point set (e. g. , fast multipole methods suffer from the curse of dimensionality ) or in the degree of the kernel function. Oblivious sketching has emerged as a powerful approach to speeding up numerical linear algebra over the past decade, but our understanding of oblivious sketching solutions for kernel matrices has remained quite limited, suffering from the aforementioned exponential dependence on input parameters. Our main contribution is a general method for applying sketching solutions developed in numerical linear algebra over the past decade to a tensoring of data points without forming the tensoring explicitly. This leads to the first oblivious sketch for the polynomial kernel with a target dimension that is only polynomially dependent on the degree of the kernel function, as well as the first oblivious sketch for the Gaussian kernel on bounded datasets that does not suffer from an exponential dependence on the dimensionality of input data points.

SODA Conference 2020 Conference Paper

Space Efficient Approximation to Maximum Matching Size from Uniform Edge Samples

  • Michael Kapralov
  • Slobodan Mitrovic
  • Ashkan Norouzi-Fard
  • Jakab Tardos

Given a source of iid samples of edges of an input graph G with n vertices and m edges, how many samples does one need to compute a constant factor approximation to the maximum matching size in G? Moreover, is it possible to obtain such an estimate in a small amount of space? We show that this problem cannot be solved using a nontrivially sublinear (in m ) number of samples: m 1− o (1) samples are needed. On the other hand, O (log 2 n ) bits of space suffice to compute an estimate. Our main technical tool is a new peeling type algorithm for matching and its local simulation. We show that a delicate balance between exploration depth and sampling rate allows our simulation to not lose precision over a logarithmic number of levels of recursion and achieve a constant factor approximation. Our algorithm also yields a constant factor approximate local computation algorithm (LCA) for matching with O ( d log n ) exploration starting from any vertex. Previous approaches were based on local simulations of randomized greedy, which take O ( d ) time in expectation over the starting vertex or edge (Yoshida et al’09, Onak et al’12), and could not achieve a better than d 2 run-time. Interestingly, we also show that unlike our algorithm, the local simulation of randomized greedy that is the basis of the most efficient prior results does take ( d 2 ) ≫ O ( d log n ) time for a worst case edge even for.

STOC Conference 2019 Conference Paper

An optimal space lower bound for approximating MAX-CUT

  • Michael Kapralov
  • Dmitry Krachun

We consider the problem of estimating the value of MAX-CUT in a graph in the streaming model of computation. At one extreme, there is a trivial 2-approximation for this problem that uses only O (log n ) space, namely, count the number of edges and output half of this value as the estimate for the size of the MAX-CUT. On the other extreme, for any fixed є > 0, if one allows Õ( n ) space, a (1+є)-approximate solution to the MAX-CUT value can be obtained by storing an Õ( n )-size sparsifier that essentially preserves MAX-CUT value. Our main result is that any (randomized) single pass streaming algorithm that breaks the 2-approximation barrier requires Ω( n )-space, thus resolving the space complexity of any non-trivial approximations of the MAX-CUT value to within polylogarithmic factors in the single pass streaming model. We achieve the result by presenting a tight analysis of the Implicit Hidden Partition Problem introduced by Kapralov et al. [SODA’17] for an arbitrarily large number of players. In this problem a number of players receive random matchings of Ω( n ) size together with random bits on the edges, and their task is to determine whether the bits correspond to parities of some hidden bipartition, or are just uniformly random. Unlike all previous Fourier analytic communication lower bounds, our analysis does not directly use bounds on the ℓ 2 norm of Fourier coefficients of a typical message at any given weight level that follow from hypercontractivity. Instead, we use the fact that graphs received by players are sparse (matchings) to obtain strong upper bounds on the ℓ 1 norm of the Fourier coefficients of the messages of individual players using their special structure, and then argue, using the convolution theorem, that similar strong bounds on the ℓ 1 norm are essentially preserved (up to an exponential loss in the number of players) once messages of different players are combined. We feel that our main technique is likely of independent interest.

NeurIPS Conference 2019 Conference Paper

Efficiently Learning Fourier Sparse Set Functions

  • Andisheh Amrollahi
  • Amir Zandieh
  • Michael Kapralov
  • Andreas Krause

Learning set functions is a key challenge arising in many domains, ranging from sketching graphs to black-box optimization with discrete parameters. In this paper we consider the problem of efficiently learning set functions that are defined over a ground set of size $n$ and that are sparse (say $k$-sparse) in the Fourier domain. This is a wide class, that includes graph and hypergraph cut functions, decision trees and more. Our central contribution is the first algorithm that allows learning functions whose Fourier support only contains low degree (say degree $d=o(n)$) polynomials using $O(k d \log n)$ sample complexity and runtime $O( kn \log^2 k \log n \log d)$. This implies that sparse graphs with $k$ edges can, for the first time, be learned from $O(k \log n)$ observations of cut values and in linear time in the number of vertices. Our algorithm can also efficiently learn (sums of) decision trees of small depth. The algorithm exploits techniques from the sparse Fourier transform literature and is easily implementable. Lastly, we also develop an efficient robust version of our algorithm and prove $\ell_2/\ell_2$ approximation guarantees without any statistical assumptions on the noise.

FOCS Conference 2019 Conference Paper

Online Matching with General Arrivals

  • Buddhima Gamlath
  • Michael Kapralov
  • Andreas Maggiori
  • Ola Svensson
  • David Wajc

The online matching problem was introduced by Karp, Vazirani and Vazirani nearly three decades ago. In that seminal work, they studied this problem in bipartite graphs with vertices arriving only on one side, and presented optimal deterministic and randomized algorithms for this setting. In comparison, more general arrival models, such as edge arrivals and general vertex arrivals, have proven more challenging and positive results are known only for various relaxations of the problem. In particular, even the basic question of whether randomization allows one to beat the trivially-optimal deterministic competitive ratio of 1/2 for either of these models was open. In this paper, we resolve this question for both these natural arrival models, and show the following. 1) For edge arrivals, randomization does not help - no randomized algorithm is better than 1/2 competitive. 2)For general vertex arrivals, randomization helps - there exists a randomized (1/2+Ω(1)) -competitive online matching algorithm.

FOCS Conference 2018 Conference Paper

Testing Graph Clusterability: Algorithms and Lower Bounds

  • Ashish Chiplunkar
  • Michael Kapralov
  • Sanjeev Khanna
  • Aida Mousavifar
  • Yuval Peres

We consider the problem of testing graph cluster structure: given access to a graph G = (V, E), can we quickly determine whether the graph can be partitioned into a few clusters with good inner conductance, or is far from any such graph? This is a generalization of the well-studied problem of testing graph expansion, where one wants to distinguish between the graph having good expansion (i. e. being a good single cluster) and the graph having a sparse cut (i. e. being a union of at least two clusters). A recent work of Czumaj, Peng, and Sohler (STOC'15) gave an ingenious sublinear time algorithm for testing k-clusterability in time Õ(n^1/2 poly(k)). Their algorithm implicitly embeds a random sample of vertices of the graph into Euclidean space, and then clusters the samples based on estimates of Euclidean distances between the points. This yields a very efficient testing algorithm, but only works if the cluster structure is very strong: it is necessary to assume that the gap between conductances of accepted and rejected graphs is at least logarithmic in the size of the graph G. In this paper we show how one can leverage more refined geometric information, namely angles as opposed to distances, to obtain a sublinear time tester that works even when the gap is a sufficiently large constant. Our tester is based on the singular value decomposition of a natural matrix derived from random walk transition probabilities from a small sample of seed nodes. We complement our algorithm with a matching lower bound on the query complexity of testing clusterability. Our lower bound is based on a novel property testing problem, which we analyze using Fourier analytic tools. As a byproduct of our techniques, we also achieve new lower bounds for the problem of approximating MAX-CUT value in sublinear time.

FOCS Conference 2018 Conference Paper

The Sketching Complexity of Graph and Hypergraph Counting

  • John Kallaugher
  • Michael Kapralov
  • Eric Price 0001

Subgraph counting is a fundamental primitive in graph processing, with applications in social network analysis (e. g. , estimating the clustering coefficient of a graph), database processing and other areas. The space complexity of subgraph counting has been studied extensively in the literature, but many natural settings are still not well understood. In this paper we revisit the subgraph (and hypergraph) counting problem in the sketching model, where the algorithm's state as it processes a stream of updates to the graph is a linear function of the stream. This model has recently received a lot of attention in the literature, and has become a standard model for solving dynamic graph streaming problems. In this paper we give a tight bound on the sketching complexity of counting the number of occurrences of a small subgraph H in a bounded degree graph G presented as a stream of edge updates. Specifically, we show that the space complexity of the problem is governed by the fractional vertex cover number of the graph H. Our subgraph counting algorithm implements a natural vertex sampling approach, with sampling probabilities governed by the vertex cover of H. Our main technical contribution lies in a new set of Fourier analytic tools that we develop to analyze multiplayer communication protocols in the simultaneous communication model, allowing us to prove a tight lower bound. We believe that our techniques are likely to find applications in other settings. Besides giving tight bounds for all graphs H, both our algorithm and lower bounds extend to the hypergraph setting, albeit with some loss in space complexity.

SODA Conference 2017 Conference Paper

(1 + Ω(1))-Αpproximation to MAX-CUT Requires Linear Space

  • Michael Kapralov
  • Sanjeev Khanna
  • Madhu Sudan 0001
  • Ameya Velingker

We consider the problem of estimating the value of MAXCUT in a graph in the streaming model of computation. We show that there exists a constant ∊ * > 0 such that any randomized streaming algorithm that computes a (1 + ∊ * )- approximation to MAX-CUT requires Ω( n ) space on an n vertex graph. By contrast, there are algorithms that produce a (1 + ∊)-approximation in space O ( n /∊ 2 ) for every ∊ > 0. Our result is the first linear space lower bound for the task of approximating the max cut value and partially answers an open question from the literature [2]. The prior state of the art ruled out (2 - ∊)-approximation in space or (1 + ∊)-approximation in space, for any ∊ > 0. Previous lower bounds for the MAX-CUT problem relied, in essence, on a lower bound on the communication complexity of the following task: Several players are each given some edges of a graph and they wish to determine if the union of these edges is ε-close to forming a bipartite graph, using one-way communication. The previous works proved a lower bound of for this task when ∊ = 1/2, and n 1_ O (∊) for every ∊ > 0, even when one of the players is given a candidate bipartition of the graph and the graph is promised to be bipartite with respect to this partition or ε-far from bipartite. This added information was essential in enabling the previous analyses but also yields a weak bound since, with this extra information, there is an n 1_ O (∊) communication protocol for this problem. In this work, we give an O ( n ) lower bound on the communication complexity of the original problem (without the extra information) for ∊ = Ω(1) in the three-player setting. Obtaining this O ( n ) lower bound on the communication complexity is the main technical result in this paper. We achieve it by a delicate choice of distributions on instances as well as a novel use of the convolution theorem from Fourier analysis combined with graph-theoretic considerations to analyze the communication complexity.

FOCS Conference 2017 Conference Paper

Optimal Lower Bounds for Universal Relation, and for Samplers and Finding Duplicates in Streams

  • Michael Kapralov
  • Jelani Nelson
  • Jakub Pachocki
  • Zhengyu Wang
  • David P. Woodruff
  • Mobin Yahyazadeh

In the communication problem UR (universal relation) [25], Alice and Bob respectively receive x, y ∈ {0, 1} n with the promise that x ≠ y. The last player to receive a message must output an index i such that x i ≠ y i. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly Θ(min{n, log(1/δ) log 2 (n/log(1/δ) )}) for failure probability δ. Our lower bound holds even if promised support(y) ⊂ support(x). As a corollary, we obtain optimal lower bounds for ℓ p -sampling in strict turnstile streams for 0 ≤ p n at all points in the stream. We give two different proofs of our main result. The first proof demonstrates that any algorithm A solving sampling problems in turnstile streams in low memory can be used to encode subsets of [n] of certain sizes into a number of bits below the information theoretic minimum. Our encoder makes adaptive queries to A throughout its execution, but done carefully so as to not violate correctness. This is accomplished by injecting random noise into the encoder's interactions with A, which is loosely motivated by techniques in differential privacy. Our correctness analysis involves understanding the ability of A to correctly answer adaptive queries which have positive but bounded mutual information with A's internal randomness, and may be of independent interest in the newly emerging area of adaptive data analysis with a theoretical computer science lens. Our second proof is via a novel randomized reduction from Augmented Indexing [30] which needs to interact with A adaptively. To handle the adaptivity we identify certain likely interaction patterns and union bound over them to guarantee correct interaction on all of them. To guarantee correctness, it is important that the interaction hides some of its randomness from A in the reduction.

ICML Conference 2017 Conference Paper

Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

  • Haim Avron
  • Michael Kapralov
  • Cameron Musco
  • Christopher Musco
  • Ameya Velingker
  • Amir Zandieh

Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. In this paper we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation point of view, give tight bounds on the number of Fourier features required to achieve a spectral approximation, and show how spectral matrix approximation bounds imply statistical guarantees for kernel ridge regression.

FOCS Conference 2017 Conference Paper

Sample Efficient Estimation and Recovery in Sparse FFT via Isolation on Average

  • Michael Kapralov

The problem of computing the Fourier Transform of a signal whose spectrum is dominated by a small number k of frequencies quickly and using a small number of samples of the signal in time domain (the Sparse FFT problem) has received significant attention recently. It is known how to approximately compute the k-sparse Fourier transform in ≈ k log 2 n time [Hassanieh et al'STOC'12], or using the optimal number O(k log n) of samples [Indyk et al'FOCS'14] in time domain, or come within (log log n) O(1) factors of both these bounds simultaneously, but no algorithm achieving the optimal O(k log n) bound in sublinear time is known. At a high level, sublinear time Sparse FFT algorithms operate by `hashing' the spectrum of the input signal into ≈ k `buckets', identifying frequencies that are `isolated' in their buckets, subtracting them from the signal and repeating until the entire signal is recovered. The notion of `isolation' in a `bucket', inspired by applications of hashing in sparse recovery with arbitrary linear measurements, has been the main tool in the analysis of Fourier hashing schemes in the literature. However, Fourier hashing schemes, which are implemented via filtering, tend to be `noisy' in the sense that a frequency that hashes into a bucket contributes a non-negligible amount to neighboring buckets. This leakage to neighboring buckets makes identification and estimation challenging, and the standard analysis based on isolation becomes difficult to use without losing ω(1) factors in sample complexity. In this paper we propose a new technique for analysing noisy hashing schemes that arise in Sparse FFT, which we refer to as isolation on average. We apply this technique to two problems in Sparse FFT: estimating the values of a list of frequencies using few samples and computing Sparse FFT itself, achieving sample-optimal results in k log O(1) n time for both. We feel that our approach will likely be of interest in designing Fourier sampling schemes for more general settings (e. g. model based Sparse FFT).

ICML Conference 2016 Conference Paper

How to Fake Multiply by a Gaussian Matrix

  • Michael Kapralov
  • Vamsi K. Potluru
  • David P. Woodruff

Have you ever wanted to multiply an n \times d matrix X, with n ≫d, on the left by an m \times n matrix \tilde G of i. i. d. Gaussian random variables, but could not afford to do it because it was too slow? In this work we propose a new randomized m \times n matrix T, for which one can compute T ⋅X in only O(nnz(X)) + \tilde O(m^1. 5 ⋅d^3) time, for which the total variation distance between the distributions T ⋅X and \tilde G ⋅X is as small as desired, i. e. , less than any positive constant. Here nnz(X) denotes the number of non-zero entries of X. Assuming nnz(X) ≫m^1. 5 ⋅d^3, this is a significant savings over the naïve O(nnz(X) m) time to compute \tilde G ⋅X. Moreover, since the total variation distance is small, we can provably use T ⋅X in place of \tilde G ⋅X in any application and have the same guarantees as if we were using \tilde G ⋅X, up to a small positive constant in error probability. We apply this transform to nonnegative matrix factorization (NMF) and support vector machines (SVM).

FOCS Conference 2014 Conference Paper

Sample-Optimal Fourier Sampling in Any Constant Dimension

  • Piotr Indyk
  • Michael Kapralov

We give an algorithm for ℓ 2 /ℓ 2 sparse recovery from Fourier measurements using O(k log N) samples, matching the lower bound of Do Ba-Indyk-Price-Woodruff'10 for non-adaptive algorithms up to constant factors for any k ≤ N 1-δ. The algorithm runs in Õ(N) time. Our algorithm extends to higher dimensions, leading to sample complexity of Õ d (k log N), which is optimal up to constant factors for any d = O(1). These are the first sample optimal algorithms for these problems. A preliminary experimental evaluation indicates that our algorithm has empirical sampling complexity comparable to that of other recovery methods known in the literature, while providing strong provable guarantees on the recovery quality.

FOCS Conference 2014 Conference Paper

Single Pass Spectral Sparsification in Dynamic Streams

  • Michael Kapralov
  • Yin Tat Lee
  • Cameron Musco
  • Christopher Musco
  • Aaron Sidford

We present the first single pass algorithm for computing spectral sparsifiers of graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph, G, our algorithm maintains a randomized linear sketch of the incidence matrix into dimension O(1/∈ 2n polylog(n)). Using this sketch, the algorithm can output a (1±∈) spectral sparsifier for G with high probability. While O(1/∈ 2n polylog(n)) space algorithms are known for computing cut sparsifiers in dynamic streams [1], [2] and spectral sparsifiers in insertion-only streams [3], prior to our work, the best known single pass algorithm for maintaining spectral sparsifiers in dynamic streams required sketches of dimension Ω(1/∈ 2n5/3 ). To achieve our result, we show that, using a coarse sparsifier of G and a linear sketch of G's incidence matrix, it is possible to sample edges by effective resistance, obtaining a spectral sparsifier of arbitrary precision. Sampling from the sketch requires a novel application of ℓ 2 /ℓ 2 sparse recovery, a natural extension of the ℓ 0 methods used for cut sparsifiers in [1]. Recent work of [2] on row sampling for matrix approximation gives a recursive approach for obtaining the required coarse sparsifiers. Under certain restrictions, our approach also extends to the problem of maintaining a spectral approximation for a general matrix A T A given a stream of updates to rows in A.

NeurIPS Conference 2011 Conference Paper

Prediction strategies without loss

  • Michael Kapralov
  • Rina Panigrahy

Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say `predict 0' or `predict 1', and our payoff is $+1$ if the prediction is correct and $-1$ otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting $0$ or always predicting $1$. For a sequence of length $T$ our algorithm has regret $14\epsilon T $ and loss $2\sqrt{T}e^{-\epsilon^2 T} $ in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of $N$ experts, where the related problem of trading off regret to the best expert for regret to the 'special' expert has been studied by Even-Dar et al. (COLT'07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al (COLT'07) and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to $k$-shifting optima, i. e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for {\em any window of size $n$} the regret of our algorithm to any expert never exceeds $O(\sqrt{n(\log N+\log T)})$, where $N$ is the number of experts and $T$ is the time horizon, while maintaining the essentially zero loss property.

STOC Conference 2010 Conference Paper

Perfect matchings in o( n log n ) time in regular bipartite graphs

  • Ashish Goel
  • Michael Kapralov
  • Sanjeev Khanna

In this paper we consider the well-studied problem of finding a perfect matching in a d-regular bipartite graph on 2n nodes with m=nd edges. The best-known algorithm for general bipartite graphs (due to Hopcroft and Karp) takes time O(m√n). In regular bipartite graphs, however, a matching is known to be computable in O(m) time (due to Cole, Ost, and Schirra). In a recent line of work by Goel, Kapralov, and Khanna the O(m) time bound was improved first to ~ O(min m, n 2.5 /d) and then to ~O(min {m, n 2 /d\}). In this paper, we give a randomized algorithm that finds a perfect matching in a d-regular graph and runs in O(n log n) time (both in expectation and with high probability). The algorithm performs an appropriately truncated alternating random walk to successively find augmenting paths. Our algorithm may be viewed as using adaptive uniform sampling, and is thus able to bypass the limitations of (non-adaptive) uniform sampling established in earlier work. Our techniques also give an algorithm that successively finds a matching in the support of a doubly stochastic matrix in expected time O(n log 2 n), with O(m) pre-processing time; this gives a simple O(m+mn log 2 n) time algorithm for finding the Birkhoff-von Neumann decomposition of a doubly stochastic matrix. We show that randomization is crucial for obtaining o(nd) time algorithms by establishing an Ω(nd) lower bound for deterministic algorithms. We also show that there does not exist a randomized algorithm that finds a matching in a regular bipartite multigraph and takes o(n log n) time with high probability.

NeurIPS Conference 2009 Conference Paper

Factor Modeling for Advertisement Targeting

  • Ye Chen
  • Michael Kapralov
  • John Canny
  • Dmitry Pavlov

We adapt a probabilistic latent variable model, namely GaP (Gamma-Poisson), to ad targeting in the contexts of sponsored search (SS) and behaviorally targeted (BT) display advertising. We also approach the important problem of ad positional bias by formulating a one-latent-dimension GaP factorization. Learning from click-through data is intrinsically large scale, even more so for ads. We scale up the algorithm to terabytes of real-world SS and BT data that contains hundreds of millions of users and hundreds of thousands of features, by leveraging the scalability characteristics of the algorithm and the inherent structure of the problem including data sparsity and locality. Specifically, we demonstrate two somewhat orthogonal philosophies of scaling algorithms to large-scale problems, through the SS and BT implementations, respectively. Finally, we report the experimental results using Yahoos vast datasets, and show that our approach substantially outperform the state-of-the-art methods in prediction accuracy. For BT in particular, the ROC area achieved by GaP is exceeding 0. 95, while one prior approach using Poisson regression yielded 0. 83. For computational performance, we compare a single-node sparse implementation with a parallel implementation using Hadoop MapReduce, the results are counterintuitive yet quite interesting. We therefore provide insights into the underlying principles of large-scale learning.