Arrow Research search

Author name cluster

Raghu Meka

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

44 papers
2 author rows

Possible papers

44

NeurIPS Conference 2025 Conference Paper

Smoothed Agnostic Learning of Halfspaces over the Hypercube

  • Yiwen Kou
  • Raghu Meka

Agnostic learning of Boolean halfspaces is a fundamental problem in computational learning theory, but it is known to be computationally hard even for weak learning. Recent work \citep{chandrasekaran2024smoothed} proposed smoothed analysis as a way to bypass such hardness, but existing frameworks rely on additive Gaussian perturbations, making them unsuitable for discrete domains. We introduce a new smoothed agnostic learning framework for Boolean inputs, where perturbations are modeled via random bit flips. This defines a natural discrete analogue of smoothed optimality generalizing the Gaussian case. Under strictly subexponential assumptions on the input distribution, we give an efficient algorithm for learning halfspaces in this model, with runtime and sample complexity $\tilde{O}(n^{\mathrm{poly}(\frac{1}{\sigma\epsilon})})$. Previously, such algorithms were known only with strong structural assumptions for the discrete hypercube—for example, independent coordinates or symmetric distributions. Our result provides the first computationally efficient guarantee for smoothed agnostic learning of halfspaces over the Boolean hypercube, bridging the gap between worst-case intractability and practical learnability in discrete settings.

STOC Conference 2024 Conference Paper

Explicit Separations between Randomized and Deterministic Number-on-Forehead Communication

  • Zander Kelley
  • Shachar Lovett
  • Raghu Meka

We study the power of randomness in the Number-on-Forehead (NOF) model in communication complexity. We construct an explicit 3-player function f :[ N ] 3 → {0,1}, such that: (i) there exist a randomized NOF protocol computing it that sends a constant number of bits; but (ii) any deterministic or nondeterministic NOF protocol computing it requires sending about (log N ) 1/3 many bits. This exponentially improves upon the previously best-known such separation. At the core of our proof is an extension of a recent result on sets of integers without 3-term arithmetic progressions into a non-arithmetic setting.

NeurIPS Conference 2023 Conference Paper

Feature Adaptation for Sparse Linear Regression

  • Jonathan Kelner
  • Frederic Koehler
  • Raghu Meka
  • Dhruv Rohatgi

Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian $N(0, \Sigma)$, and we seek an estimator with small excess risk. If the true signal is $t$-sparse, information-theoretically, it is possible to achieve strong recovery guarantees with only $O(t\log n)$ samples. However, computationally efficient algorithms have sample complexity linear in (some variant of) the *condition number* of $\Sigma$. Classical algorithms such as the Lasso can require significantly more samples than necessary even if there is only a single sparse approximate dependency among the covariates. We provide a polynomial-time algorithm that, given $\Sigma$, automatically adapts the Lasso to tolerate a small number of approximate dependencies. In particular, we achieve near-optimal sample complexity for constant sparsity and if $\Sigma$ has few ``outlier'' eigenvalues. Our algorithm fits into a broader framework of *feature adaptation* for sparse linear regression with ill-conditioned covariates. With this framework, we additionally provide the first polynomial-factor improvement over brute-force search for constant sparsity $t$ and arbitrary covariance $\Sigma$.

ICML Conference 2023 Conference Paper

On User-Level Private Convex Optimization

  • Badih Ghazi
  • Pritish Kamath
  • Ravi Kumar 0001
  • Pasin Manurangsi
  • Raghu Meka
  • Chiyuan Zhang

We introduce a new mechanism for stochastic convex optimization (SCO) with user-level differential privacy guarantees. The convergence rates of this mechanism are similar to those in the prior work of Levy et al. 2021 and Narayanan et al. 2022, but with two important improvements. Our mechanism does not require any smoothness assumptions on the loss. Furthermore, our bounds are also the first where the minimum number of users needed for user-level privacy has no dependence on the dimension and only a logarithmic dependence on the desired excess error. The main idea underlying the new mechanism is to show that the optimizers of strongly convex losses have low local deletion sensitivity, along with a new output perturbation method for functions with low local deletion sensitivity, which could be of independent interest.

STOC Conference 2023 Conference Paper

Resolving Matrix Spencer Conjecture Up to Poly-logarithmic Rank

  • Nikhil Bansal 0001
  • Haotian Jiang
  • Raghu Meka

We give a simple proof of the matrix Spencer conjecture up to poly-logarithmic rank: given symmetric d × d matrices A 1 ,…, A n each with || A i || op ≤ 1 and rank at most n /log 3 n , one can efficiently find ± 1 signs x 1 ,…, x n such that their signed sum has spectral norm ||∑ i =1 n x i A i || op = O (√ n ). This result also implies a log n − Ω( loglog n ) qubit lower bound for quantum random access codes encoding n classical bits with advantage ≫ 1/√ n . Our proof uses the recent refinement of the non-commutative Khintchine inequality in [Bandeira, Boedihardjo, van Handel, 2021] for random matrices with correlated Gaussian entries.

FOCS Conference 2023 Conference Paper

Strong Bounds for 3-Progressions

  • Zander Kelley
  • Raghu Meka

We show that for some constant $\beta\gt0$, any subset A of integers $\{1, \ldots, N\}$ of size at least $2^{-O\left((\log N)^{\beta}\right)} \cdot N$ contains a non-trivial three-term arithmetic progression. Previously, three-term arithmetic progressions were known to exist only for sets of size at least $N /(\log N)^{1+c}$ for a constant $c\gt0$. Our approach is first to develop new analytic techniques for addressing some related questions in the finite-field setting and then to apply some analogous variants of these same techniques, suitably adapted for the more complicated setting of integers.

NeurIPS Conference 2023 Conference Paper

User-Level Differential Privacy With Few Examples Per User

  • Badih Ghazi
  • Pritish Kamath
  • Ravi Kumar
  • Pasin Manurangsi
  • Raghu Meka
  • Chiyuan Zhang

Previous work on user-level differential privacy (DP) [Ghazi et al. NeurIPS 2021, Bun et al. STOC 2023] obtained generic algorithms that work for various learning tasks. However, their focus was on the *example-rich* regime, where the users have so many examples that each user could themselves solve the problem. In this work we consider the *example-scarce* regime, where each user has only a few examples, and obtain the following results: * For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm. Roughly speaking, the latter gives a (multiplicative) savings of $O_{\varepsilon, \delta}(\sqrt{m})$ in terms of the number of users required for achieving the same utility, where $m$ is the number of examples per user. This algorithm, while recovering most known bounds for specific problems, also gives new bounds, e. g. , for PAC learning. * For pure-DP, we present a simple technique for adapting the exponential mechanism [McSherry & Talwar, FOCS 2007] to the user-level setting. This gives new bounds for a variety of tasks, such as private PAC learning, hypothesis selection, and distribution learning. For some of these problems, we show that our bounds are near-optimal.

NeurIPS Conference 2022 Conference Paper

Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks

  • Sitan Chen
  • Aravind Gollakota
  • Adam Klivans
  • Raghu Meka

We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No general SQ lower bounds were known for learning ReLU networks of any depth in this setting: previous SQ lower bounds held only for adversarial noise models (agnostic learning) (Kothari and Klivans 2014, Goel et al. 2020a, Diakonikolas et al. 2020a) or restricted models such as correlational SQ (Goel et al. 2020b, Diakonikolas et al. 2020b). Prior work hinted at the impossibility of our result: Vempala and Wilmes (2019) showed that general SQ lower bounds cannot apply to any real-valued family of functions that satisfies a simple non-degeneracy condition. To circumvent their result, we refine a lifting procedure due to Daniely and Vardi (2021) that reduces Boolean PAC learning problems to Gaussian ones. We show how to extend their technique to other learning models and, in many well-studied cases, obtain a more efficient reduction. As such, we also prove new cryptographic hardness results for PAC learning two-hidden-layer ReLU networks, as well as new lower bounds for learning constant-depth ReLU networks from membership queries.

NeurIPS Conference 2022 Conference Paper

Lower Bounds on Randomly Preconditioned Lasso via Robust Sparse Designs

  • Jonathan Kelner
  • Frederic Koehler
  • Raghu Meka
  • Dhruv Rohatgi

Sparse linear regression with ill-conditioned Gaussian random covariates is widely believed to exhibit a statistical/computational gap, but there is surprisingly little formal evidence for this belief. Recent work has shown that, for certain covariance matrices, the broad class of Preconditioned Lasso programs provably cannot succeed on polylogarithmically sparse signals with a sublinear number of samples. However, this lower bound only holds against deterministic preconditioners, and in many contexts randomization is crucial to the success of preconditioners. We prove a stronger lower bound that rules out randomized preconditioners. For an appropriate covariance matrix, we construct a single signal distribution on which any invertibly-preconditioned Lasso program fails with high probability, unless it receives a linear number of samples. Surprisingly, at the heart of our lower bound is a new robustness result in compressed sensing. In particular, we study recovering a sparse signal when a few measurements can be erased adversarially. To our knowledge, this natural question has not been studied before for sparse measurements. We surprisingly show that standard sparse Bernoulli measurements are almost-optimally robust to adversarial erasures: if $b$ measurements are erased, then all but $O(b)$ of the coordinates of the signal are identifiable.

ICLR Conference 2022 Conference Paper

Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs

  • Sitan Chen
  • Jerry Li 0001
  • Yuanzhi Li
  • Raghu Meka

Arguably the most fundamental question in the theory of generative adversarial networks (GANs) is to understand when GANs can actually learn the underlying distribution. Theoretical and empirical evidence (see e.g. Arora-Risteski-Zhang '18) suggest local optimality of the empirical training objective is insufficient, yet it does not rule out the possibility that achieving a true population minimax optimal solution might imply distribution learning. In this paper, we show that standard cryptographic assumptions imply that this stronger condition is still insufficient. Namely, we show that if local pseudorandom generators (PRGs) exist, then for a large family of natural target distributions, there are ReLU network generators of constant depth and poly size which take Gaussian random seeds so that (i) the output is far in Wasserstein distance from the target distribution, but (ii) no polynomially large Lipschitz discriminator ReLU network can detect this. This implies that even achieving a population minimax optimal solution to the Wasserstein GAN objective is likely insufficient for distribution learning. Our techniques reveal a deep connection between GANs and PRGs, which we believe will lead to further insights into the computational landscape of GANs.

NeurIPS Conference 2022 Conference Paper

Sketching based Representations for Robust Image Classification with Provable Guarantees

  • Nishanth Dikkala
  • Sankeerth Rao Karingula
  • Raghu Meka
  • Jelani Nelson
  • Rina Panigrahy
  • Xin Wang

How do we provably represent images succinctly so that their essential latent attributes are correctly captured by the representation to as high level of detail as possible? While today's deep networks (such as CNNs) produce image embeddings they do not have any provable properties and seem to work in mysterious non-interpretable ways. In this work we theoretically study synthetic images that are composed of a union or intersection of several mathematically specified shapes using thresholded polynomial functions (for e. g. ellipses, rectangles). We show how to produce a succinct sketch of such an image so that the sketch “smoothly” maps to the latent-coefficients producing the different shapes in the image. We prove several important properties such as: easy reconstruction of the image from the sketch, similarity preservation (similar shapes produce similar sketches), being able to index sketches so that other similar images and parts of other images can be retrieved, being able to store the sketches into a dictionary of concepts and shapes so parts of the same or different images that refer to the same shape can point to the same entry in this dictionary of common shape attributes.

NeurIPS Conference 2021 Conference Paper

Efficiently Learning One Hidden Layer ReLU Networks From Queries

  • Sitan Chen
  • Adam Klivans
  • Raghu Meka

While the problem of PAC learning neural networks from samples has received considerable attention in recent years, in certain settings like model extraction attacks, it is reasonable to imagine having more than just the ability to observe random labeled examples. Motivated by this, we consider the following problem: given \emph{black-box query access} to a neural network $F$, recover $F$ up to some error. Formally, we show that if $F$ is an arbitrary one hidden layer neural network with ReLU activations, there is an algorithm with query complexity and runtime polynomial in all parameters which outputs a network $F’$ achieving low square loss relative to $F$ with respect to the Gaussian measure. While a number of works in the security literature have proposed and empirically demonstrated the effectiveness of certain algorithms for this problem, ours is to the best of our knowledge the first provable guarantee in this vein.

FOCS Conference 2021 Conference Paper

Learning Deep ReLU Networks Is Fixed-Parameter Tractable

  • Sitan Chen
  • Adam R. Klivans
  • Raghu Meka

We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. Our results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. Our bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network (we show that some dependence on the Lipschitz constant is necessary). We also give a bound that is doubly exponential in the size of the network but is independent of spectral norm. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, even when the above parameters are bounded by a constant. Additionally, all prior work for the depth-two case requires well-conditioned weights and/or positive coefficients to obtain efficient run-times. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry.

FOCS Conference 2021 Conference Paper

On the Power of Preconditioning in Sparse Linear Regression

  • Jonathan A. Kelner
  • Frederic Koehler
  • Raghu Meka
  • Dhruv Rohatgi

Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to efficiently solve it without restrictive conditions on the design matrix. We consider the (correlated) random design setting, where the covariates are independently drawn from a multivariate Gaussian $N(0, \ \Sigma)$, for some $n\times n$ positive semi-definite matrix $\Sigma$, and seek estimators $\hat{w}$ minimizing $(\hat{w}-w^{\ast})^{T}\Sigma(\hat{w}-w^{\ast})$, where $w^{\ast}$ is the k-sparse ground truth. Information theoretically, one can achieve strong error bounds with only $O(k\log n)$ samples for arbitrary $\Sigma$ and $w^{\ast}$; however, no efficient algorithms are known to match these guarantees even with $o(n)$ samples, without further assumptions on $\Sigma$ or $w^{\ast}$. Yet there is little evidence for this gap in the random design setting: computational lower bounds are only known for worst-case design matrices. To date, random-design instances (i. e. specific covariance matrices $\Sigma$ ) have only been proven hard against the Lasso program and variants. More precisely, these “hard” instances can often be solved by Lasso after a simple change-of-basis (i. e. preconditioning). In this work, we give both upper and lower bounds clarifying the power of preconditioning as a tool for solving sparse linear regression problems. On the one hand, we show that the preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally: it succeeds whenever the dependency structure of the covariates, in the sense of the Markov property, has low treewidth - even if $\Sigma$ is highly ill-conditioned. This upper bound builds on ideas from the wavelet and signal processing literature. As a special case of this result, we give an algorithm for sparse linear regression with covariates from an autoregressive time series model, where we also show that the (usual) Lasso provably fails. On the other hand, we construct (for the first time) random-design instances which are provably hard even for an optimally preconditioned Lasso. In fact, we complete our treewidth classification by proving that for any treewidth-t graph, there exists a Gaussian Markov Random Field on this graph such that the preconditioned Lasso, with any choice of preconditioner, requires $\Omega(t^{1/20})$ samples to recover $O(\log n)$ -sparse signals when covariates are drawn from this model.

SODA Conference 2021 Conference Paper

Online Discrepancy Minimization for Stochastic Arrivals

  • Nikhil Bansal 0001
  • Haotian Jiang
  • Raghu Meka
  • Sahil Singla 0001
  • Makrand Sinha

In the stochastic online vector balancing problem, vectors v 1, v 2, …, v T chosen independently from an arbitrary distribution in ℝ n arrive one-by-one and must be immediately given a ± sign. The goal is to keep the norm of the discrepancy vector, i. e. , the signed prefix-sum, as small as possible for a given target norm. We consider some of the most well-known problems in discrepancy theory in the above online stochastic setting, and give algorithms that match the known offline bounds up to polylog( nT ) factors. This substantially generalizes and improves upon the previous results of Bansal, Jiang, Singla, and Sinha (STOC' 20). In particular, for the Komlós problem where ‖v t ‖ 2 ≤ 1 for each t, our algorithm achieves Õ (1) discrepancy with high probability, improving upon the previous Õ ( n 3/2 ) bound. For Tusnády's problem of minimizing the discrepancy of axis-aligned boxes, we obtain an O (log d +4 T ) bound for arbitrary distribution over points. Previous techniques only worked for product distributions and gave a weaker O (log 2 d +1 T ) bound. We also consider the Banaszczyk setting, where given a symmetric convex body K with Gaussian measure at least 1/2, our algorithm achieves Õ (1) discrepancy with respect to the norm given by K for input distributions with sub-exponential tails. Our results are based on a new potential function approach. Previous techniques consider a potential that penalizes large discrepancy, and greedily chooses the next color to minimize the increase in potential. Our key idea is to introduce a potential that also enforces constraints on how the discrepancy vector evolves, allowing us to maintain certain anti-concentration properties. We believe that our techniques to control the evolution of states could find other applications in stochastic processes and online algorithms. For the Banaszczyk setting, we further enhance this potential by combining it with ideas from generic chaining. Finally, we also extend these results to the setting of online multicolor discrepancy.

FOCS Conference 2020 Conference Paper

Extractors and Secret Sharing Against Bounded Collusion Protocols

  • Eshan Chattopadhyay
  • Jesse Goodman
  • Vipul Goyal
  • Ashutosh Kumar 0002
  • Xin Li 0006
  • Raghu Meka
  • David Zuckerman

In a recent work, Kumar, Meka, and Sahai (FOCS 2019) introduced the notion of bounded collusion protocols (BCPs). BCPs are multiparty communication protocols in which N parties, holding n bits each, attempt to compute some joint function of their inputs, f: ({0, 1} n ) N →{0, 1}. In each round, p parties (the collusion bound) work together to write a single bit on a public blackboard, and the protocol continues until every party knows the value of f. BCPs are a natural generalization of the well-studied number-in-hand (NIH) and number-on-forehead (NOF) models, which are just endpoints on this rich spectrum of protocols (corresponding to p=1 and p=N-1, respectively). In this work, we investigate BCPs more thoroughly, and answer questions about them in the context of communication complexity, randomness extractors, and secret sharing. 1. First, we provide explicit lower bounds against BCPs. Our lower bounds offer a tradeoff between collusion and complexity, and are of the form n Ω(1) when p=0. 99N parties collude. This bound is independent of the relationship between N, n, whereas all previous bounds became trivial when. 2. Second, we provide explicit leakage-resilient extractors against BCPs. Also known as cylinder-intersection extractors, these objects are multi-source extractors of the form Ext: ({0, 1} n ) N →{0, 1}, whose output looks uniform even conditioned on the bits produced (“leaked”) by a BCP executed over the inputs of the extractor. Our extractors work for sources with min-entropy k ≥ polylog(n) against BCPs with collusion p ≤ N-2. Previously, all such extractors required min-entropy k ≥ 0. 99n even when p ≤ O(1). 3. Third, we provide efficient leakage-resilient secret sharing schemes against BCPs. These cryptographic primitives are standard t-out-of- N secret sharing schemes, equipped with an additional guarantee that the secret remains hidden even if the individuals participate in a BCP using their shares. Our schemes can handle collusion up to p ≤ O(t/logt), whereas the previous best scheme required p ≤ O(logN). Along the way, we also construct objects that are more general than those listed above (i. e. , compilers), objects that are more specialized (and stronger) than those listed above, and resolve open questions posed by Goyal and Kumar (STOC 2018) and Kumar, Meka, and Sahai (FOCS 2019).

NeurIPS Conference 2020 Conference Paper

Learning Some Popular Gaussian Graphical Models without Condition Number Bounds

  • Jonathan Kelner
  • Frederic Koehler
  • Raghu Meka
  • Ankur Moitra

Gaussian Graphical Models (GGMs) have wide-ranging applications in machine learning and the natural and social sciences. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and they are assumed to be sparse. While there are a variety of algorithms (e. g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, to do so they require various assumptions on the well-conditioning of the precision matrix that are not information-theoretically necessary. Here we give the first fixed polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. Our result for structure recovery in walk-summable GGMs is derived from a more general result for efficient sparse linear regression in walk-summable models without any norm dependencies. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains. Our algorithms do not.

FOCS Conference 2019 Conference Paper

Leakage-Resilient Secret Sharing Against Colluding Parties

  • Ashutosh Kumar 0002
  • Raghu Meka
  • Amit Sahai

In this work, we consider the natural goal of designing secret sharing schemes that ensure security against an adversary who may learn some “leaked'' information about all the shares. We say that a secret sharing scheme is p-party leakage-resilient, if the secret remains statistically hidden even after a computationally unbounded adversary learns a bounded amount of leakage, where each bit of leakage adaptively and jointly depends on the shares of an adaptively chosen subset of p parties. Existing multi-party secret sharing schemes (Dziembowski and Pietrzak FOCS 07), (Goyal and Kumar STOC 18) and (Benhamouda, Degwekar, Ishai and Rabin CRYPTO 18) have focused on handling non-adaptive and individual leakage for (limited special cases of) threshold secret sharing schemes. (1) We give an unconditional compiler that transforms any secret sharing scheme on n parties into a p-party leakage-resilient one for p upto O(log n). This yields the first multi-party secret sharing schemes that are secure against adaptive or joint leakage. (2) As a natural extension, we initiate the study of leakage-resilient non-malleable secret sharing. We empower the adversary to adaptively leak from each of the shares and then use the leakage to tamper with all of them arbitrarily and independently. Leveraging our p-party leakage-resilient schemes, we compile any secret sharing scheme into a non-malleable one ensuring that any such tampering either preserves the secret or completely `destroys' it. This improves upon the non-malleable secret sharing scheme of (Goyal and Kumar CRYPTO 18) where no leakage was permitted. Leakage-resilient non-malleable codes can be seen as 2-out-of-2 schemes satisfying our guarantee and have already found many applications in cryptography. (3) Our constructions rely on a clean connection we draw to communication complexity in the well-studied number-on-forehead (NOF) model and rely on functions that have strong communication-complexity lower bounds in the NOF model (in a black-box way). We get efficient p-party leakage-resilient schemes for p upto O(log n) as our share sizes have exponential dependence on p. We observe that improving this exponential dependence, even for simultaneous, non-adaptive leakage, will lead to progress on longstanding open problems in complexity theory.

SODA Conference 2019 Conference Paper

On the discrepancy of random low degree set systems

  • Nikhil Bansal 0001
  • Raghu Meka

Motivated by the celebrated Beck-Fiala conjecture, we consider the random setting where there are n elements and m sets and each element lies in t randomly chosen sets. In this setting, Ezra and Lovett showed an O (( t log t ) 1/2 ) discrepancy bound in the regime when n ≤ m and an O (1) bound when n ≫ m t. In this paper, we give a tight bound for the entire range of n and m, under a mild assumption that t = Ω(log log m ) 2. The result is based on two steps. First, applying the partial coloring method to the case when n = m log O (1) m and using the properties of the random set system we show that the overall discrepancy incurred is at most. Second, we reduce the general case to that of n ≤ m log O (1) m using LP duality and a careful counting argument.

ICML Conference 2018 Conference Paper

Learning One Convolutional Layer with Overlapping Patches

  • Surbhi Goel
  • Adam R. Klivans
  • Raghu Meka

We give the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches under mild conditions on the underlying distribution. We prove that our framework captures commonly used schemes from computer vision, including one-dimensional and two-dimensional “patch and stride” convolutions. Our algorithm– Convotron – is inspired by recent work applying isotonic regression to learning neural networks. Convotron uses a simple, iterative update rule that is stochastic in nature and tolerant to noise (requires only that the conditional mean function is a one layer convolutional network, as opposed to the realizable setting). In contrast to gradient descent, Convotron requires no special initialization or learning-rate tuning to converge to the global optimum. We also point out that learning one hidden convolutional layer with respect to a Gaussian distribution and just one disjoint patch $P$ (the other patches may be arbitrary) is easy in the following sense: Convotron can efficiently recover the hidden weight vector by updating only in the direction of $P$.

FOCS Conference 2017 Conference Paper

Learning Graphical Models Using Multiplicative Weights

  • Adam R. Klivans
  • Raghu Meka

We give a simple, multiplicative-weight update algorithm for learning undirected graphical models or Markov random fields (MRFs). The approach is new, and for the well-studied case of Ising models or Boltzmann machines we obtain an algorithm that uses a nearly optimal number of samples and has running time Õ(n 2 ) (where n is the dimension), subsuming and improving on all prior work. Additionally, we give the first efficient algorithm for learning Ising models over non-binary alphabets. Our main application is an algorithm for learning the structure of t-wise MRFs with nearly-optimal sample complexity (up to polynomial losses in necessary terms that depend on the weights) and running time that is n O(t). In addition, given n O(t) samples, we can also learn the parameters of the model and generate a hypothesis that is close in statistical distance to the true MRF. All prior work runs in time n Ω(d) for graphs of bounded degree d and does not generate a hypothesis close in statistical distance even for t = 3. We observe that our runtime has the correct dependence on n and t assuming the hardness of learning sparse parities with noise. Our algorithm- the Sparsitron- is easy to implement (has only one parameter) and holds in the on-line setting. Its analysis applies a regret bound from Freund and Schapires classic Hedge algorithm. It also gives the first solution to the problem of learning sparse Generalized Linear Models (GLMs).

FOCS Conference 2015 Conference Paper

Pseudorandomness via the Discrete Fourier Transform

  • Parikshit Gopalan
  • Daniel M. Kane
  • Raghu Meka

We present a new approach to constructing unconditional pseudorandom generators against classes of functions that involve computing a linear function of the inputs. We give an explicit construction of a pseudorandom generator that fools the discrete Fourier transforms of linear functions with seed-length that is nearly logarithmic (up to polyloglog factors) in the input size and the desired error parameter. Our result gives a single pseudorandom generator that fools several important classes of tests computable in log space that have been considered in the literature, including half spaces (over general domains), modular tests and combinatorial shapes. For all these classes, our generator is the first that achieves near logarithmic seed-length in both the input length and the error parameter. Getting such a seed-length is a natural challenge in its own right, which needs to be overcome in order to derandomize RL -- a central question in complexity theory. Our construction combines ideas from a large body of prior work, ranging from a classical construction of [1] to the recent gradually increasing independence paradigm of [2] -- [4], while also introducing some novel analytic machinery which might find other applications.

STOC Conference 2015 Conference Paper

Rectangles Are Nonnegative Juntas

  • Mika Göös
  • Shachar Lovett
  • Raghu Meka
  • Thomas Watson 0001
  • David Zuckerman

We develop a new method to prove communication lower bounds for composed functions of the form f o g n where f is any boolean function on n inputs and g is a sufficiently "hard" two-party gadget. Our main structure theorem states that each rectangle in the communication matrix of f o g n can be simulated by a nonnegative combination of juntas. This is the strongest yet formalization for the intuition that each low-communication randomized protocol can only "query" few inputs of f as encoded by the gadget g. Consequently, we characterize the communication complexity of f o g n in all known one-sided zero-communication models by a corresponding query complexity measure of f. These models in turn capture important lower bound techniques such as corruption, smooth rectangle bound, relaxed partition bound, and extended discrepancy. As applications, we resolve several open problems from prior work: We show that SBPcc (a class characterized by corruption) is not closed under intersection. An immediate corollary is that MAcc ≠ SBPcc. These results answer questions of Klauck (CCC 2003) and Bohler et al. (JCSS 2006). We also show that approximate nonnegative rank of partial boolean matrices does not admit efficient error reduction. This answers a question of Kol et al. (ICALP) for partial matrices.

STOC Conference 2015 Conference Paper

Sum-of-squares Lower Bounds for Planted Clique

  • Raghu Meka
  • Aaron Potechin
  • Avi Wigderson

Finding cliques in random graphs and the closely related "planted" clique variant, where a clique of size k is planted in a random G(n,1/2) graph, have been the focus of substantial study in algorithm design. Despite much effort, the best known polynomial-time algorithms only solve the problem for k = Θ(√n). In this paper we study the complexity of the planted clique problem under algorithms from the Sum-Of-Squares hierarchy. We prove the first average case lower bound for this model: for almost all graphs in G(n,1/2), r rounds of the SOS hierarchy cannot find a planted k-clique unless k ≥ (√n/log n) 1/r C r . Thus, for any constant number of rounds planted cliques of size n o(1) cannot be found by this powerful class of algorithms. This is shown via an integrability gap for the natural formulation of maximum clique problem on random graphs for SOS and Lasserre hierarchies, which in turn follow from degree lower bounds for the Positivestellensatz proof system. We follow the usual recipe for such proofs. First, we introduce a natural "dual certificate" (also known as a "vector-solution" or "pseudo-expectation") for the given system of polynomial equations representing the problem for every fixed input graph. Then we show that the matrix associated with this dual certificate is PSD (positive semi-definite) with high probability over the choice of the input graph.This requires the use of certain tools. One is the theory of association schemes, and in particular the eigenspaces and eigenvalues of the Johnson scheme. Another is a combinatorial method we develop to compute (via traces) norm bounds for certain random matrices whose entries are highly dependent; we hope this method will be useful elsewhere.

STOC Conference 2013 Conference Paper

A PRG for lipschitz functions of polynomials with applications to sparsest cut

  • Daniel M. Kane
  • Raghu Meka

We give improved pseudorandom generators (PRGs) for Lipschitz functions of low-degree polynomials over the hypercube. These are functions of the form ψ(P(x)), where P:{1,-1} n -> R is a low-degree polynomial and ψ:R -> R is a function with small Lipschitz constant. PRGs for smooth functions of low-degree polynomials have received a lot of attention recently and play an important role in constructing PRGs for the natural class of polynomial threshold functions [12,13,24,16,15]. In spite of the recent progress, no nontrivial PRGs were known for fooling Lipschitz functions of degree O(log n) polynomials even for constant error rate. In this work, we give the first such generator obtaining a seed-length of (log n)~O(l 2 /ε 2 ) for fooling degree l polynomials with error ε. Previous generators had an exponential dependence on the degree l. We use our PRG to get better integrality gap instances for sparsest cut, a fundamental problem in graph theory with many applications in graph optimization. We give an instance of uniform sparsest cut for which a powerful semi-definite relaxation (SDP) first introduced by Goemans and Linial and studied in the seminal work of Arora, Rao and Vazirani [3] has an integrality gap of exp(Ω((log log n) 1/2 )). Understanding the performance of the Goemans-Linial SDP for uniform sparsest cut is an important open problem in approximation algorithms and metric embeddings. Our work gives a near-exponential improvement over previous lower bounds which achieved a gap of Ω(log log n) [11,21]. Our gap instance builds on the recent short code gadgets of Barak et al. [5].

FOCS Conference 2012 Conference Paper

A PTAS for Computing the Supremum of Gaussian Processes

  • Raghu Meka

We give a polynomial time approximation scheme (PTAS) for computing the supremum of a Gaussian process. That is, given a finite set of vectors V ⊆ R d, we compute a (1+ε)-factor approximation to E X←N d [sup v∈V |〈v, X〉|] deterministically in time poly(d) · |V|(O ε ) (1). Previously, only a constant factor deterministic polynomial time approximation algorithm was known due to the work of Ding, Lee and Peres [1]. This answers an open question of Lee [2] and Ding [3]. The study of supremum of Gaussian processes is of considerable importance in probability with applications in functional analysis, convex geometry, and in light of the recent breakthrough work of Ding, Lee and Peres [1], to random walks on finite graphs. As such our result could be of use elsewhere. In particular, combining with the recent work of Ding [3], our result yields a PTAS for computing the cover time of bounded degree graphs. Previously, such algorithms were known only for trees. Along the way, we also give an explicit oblivious estimator for semi-norms in Gaussian space with optimal query complexity. Our algorithm and its analysis are elementary in nature using two classical comparison inequalities in convex geometry- Slepian's lemma and Kanter's lemma.

FOCS Conference 2012 Conference Paper

Better Pseudorandom Generators from Milder Pseudorandom Restrictions

  • Parikshit Gopalan
  • Raghu Meka
  • Omer Reingold
  • Luca Trevisan 0001
  • Salil P. Vadhan

We present an iterative approach to constructing pseudorandom generators, based on the repeated application of mild pseudorandom restrictions. We use this template to construct pseudorandom generators for combinatorial rectangles and read-once CNFs and a hitting set generator for width-3 branching programs, all of which achieve near-optimal seed-length even in the low-error regime: We get seed-length Õ(log (n/ε)) for error ε. Previously, only constructions with seed-length O(log 3/2 n) or O(log 2 n) were known for these classes with error ε = 1/poly(n). The (pseudo)random restrictions we use are milder than those typically used for proving circuit lower bounds in that we only set a constant fraction of the bits at a time. While such restrictions do not simplify the functions drastically, we show that they can be derandomized using small-bias spaces.

FOCS Conference 2012 Conference Paper

Constructive Discrepancy Minimization by Walking on the Edges

  • Shachar Lovett
  • Raghu Meka

Minimizing the discrepancy of a set system is a fundamental problem in combinatorics. One of the cornerstones in this area is the celebrated six standard deviations result of Spencer (AMS 1985): In any system of n sets in a universe of size n, there always exists a coloring which achieves discrepancy 6√n. The original proof of Spencer was existential in nature, and did not give an efficient algorithm to find such a coloring. Recently, a breakthrough work of Bansal (FOCS 2010) gave an efficient algorithm which finds such a coloring. His algorithm was based on an SDP relaxation of the discrepancy problem and a clever rounding procedure. In this work we give a new randomized algorithm to find a coloring as in Spencer's result based on a restricted random walk we call Edge-Walk. Our algorithm and its analysis use only basic linear algebra and is “truly” constructive in that it does not appeal to the existential arguments, giving a new proof of Spencer's theorem and the partial coloring lemma.

FOCS Conference 2012 Conference Paper

Making the Long Code Shorter

  • Boaz Barak
  • Parikshit Gopalan
  • Johan Håstad
  • Raghu Meka
  • Prasad Raghavendra
  • David Steurer

The long code is a central tool in hardness of approximation, especially in questions related to the unique games conjecture. We construct a new code that is exponentially more efficient, but can still be used in many of these applications. Using the new code we obtain exponential improvements over several known results, including the following: 1) For any ε >; 0, we show the existence of an n vertex graph G where every set of o(n) vertices has expansion 1 - ε, but G's adjacency matrix has more than exp(log δ n) eigenvalues larger than 1 - ε, where δ depends only on ε. This answers an open question of Arora, Barak and Steurer (FOCS 2010) who asked whether one can improve over the noise graph on the Boolean hypercube that has poly(log n) such eigenvalues. 2) A gadget that reduces unique games instances with linear constraints modulo K into instances with alphabet k with a blowup of K polylog(K), improving over the previously known gadget with blowup of 2 Ω(K). 3) An n variable integrality gap for Unique Games that survives exp(poly(log log n)) rounds of the SDP + Sherali Adams hierarchy, improving on the previously known bound of poly(log log n). We show a connection between the local testability of linear codes and small set expansion in certain related Cayley graphs, and use this connection to derandomize the noise graph on the Boolean hypercube.

FOCS Conference 2012 Conference Paper

Pseudorandomness from Shrinkage

  • Russell Impagliazzo
  • Raghu Meka
  • David Zuckerman

One powerful theme in complexity theory and pseudorandomness in the past few decades has been the use lower bounds to give pseudorandom generators (PRGs). However, the general results using this hardness vs. randomness paradigm suffer a quantitative loss in parameters, and hence do not give nontrivial implications for models where we don't know superpolynomial lower bounds but do know lower bounds of a fixed polynomial. We show that when such lower bounds are proved using random restrictions, we can construct PRGs which are essentially best possible without in turn improving the lower bounds. More specifically, say that a circuit family has shrinkage exponent Γ if a random restriction leaving a p fraction of variables unset shrinks the size of any circuit in the family by a factor of p Γ+o(1). Our PRG uses a seed of length s 1/(Γ+1)+o(1) to fool circuits in the family of size s. By using this generic construction, we get PRGs with polynomially small error for the following classes of circuits of size s and with the following seed lengths: 1) For de Morgan formulas, seed length s 1/3+o(1); 2) For formulas over an arbitrary basis, seed length s 1/2+o(1); 3) For read-once de Morgan formulas, seed length s. 234. .. ; 4) For branching programs of size s, seed length s 1/2+o(1). The previous best PRGs known for these classes used seeds of length bigger than n/2 to output n bits, and worked only when the size s = O(n) [1].

FOCS Conference 2011 Conference Paper

An FPTAS for #Knapsack and Related Counting Problems

  • Parikshit Gopalan
  • Adam R. Klivans
  • Raghu Meka
  • Daniel Stefankovic
  • Santosh S. Vempala
  • Eric Vigoda

Given $n$ elements with non-negative integer weights $w_1, .. ., w_n$ and an integer capacity $C$, we consider the counting version of the classic knapsack problem: find the number of distinct subsets whose weights add up to at most $C$. We give the first deterministic, fully polynomial-time approximation scheme (FPTAS) for estimating the number of solutions to any knapsack constraint (our estimate has relative error $1 \pm \epsilon$). Our algorithm is based on dynamic programming. Previously, randomized polynomial-time approximation schemes (FPRAS) were known first by Morris and Sinclair via Markov chain Monte Carlo techniques, and subsequently by Dyer via dynamic programming and rejection sampling. In addition, we present a new method for deterministic approximate counting using {\em read-once branching programs. } Our approach yields an FPTAS for several other counting problems, including counting solutions for the multidimensional knapsack problem with a constant number of constraints, the general integer knapsack problem, and the contingency tables problem with a constant number of rows.

NeurIPS Conference 2010 Conference Paper

Guaranteed Rank Minimization via Singular Value Projection

  • Prateek Jain
  • Raghu Meka
  • Inderjit Dhillon

Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints ARMP and show that SVP recovers the minimum rank solution for affine constraints that satisfy a Restricted Isometry Property} (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of low-rank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank Incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of \cite{CaiCS2008, LeeB2009b, KeshavanOM2009}, for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem.

NeurIPS Conference 2009 Conference Paper

Matrix Completion from Power-Law Distributed Samples

  • Raghu Meka
  • Prateek Jain
  • Inderjit Dhillon

The low-rank matrix completion problem is a fundamental problem with many important applications. Recently, Candes & Recht, Keshavan et al. and Candes & Tao obtained the first non-trivial theoretical results for the problem assuming that the observed entries are sampled uniformly at random. Unfortunately, most real-world datasets do not satisfy this assumption, but instead exhibit power-law distributed samples. In this paper, we propose a graph theoretic approach to matrix completion that solves the problem for more realistic sampling models. Our method is easier to analyze than previous methods with the analysis reducing to computing the threshold for complete cascades in random graphs, a problem of independent interest. By analyzing the graph theoretic problem, we show that our method achieves exact recovery when the observed entries are sampled from the Chung-Lu-Vu model, which can generate power-law distributed graphs. We also hypothesize that our algorithm solves the matrix completion problem from an optimal number of entries for the popular preferential attachment model and provide strong empirical evidence for the claim. Furthermore, our method is easier to implement and is substantially faster than existing methods. We demonstrate the effectiveness of our method on examples when the low-rank matrix is sampled according to the prevalent random graph models for complex networks and also on the Netflix challenge dataset.

ICML Conference 2008 Conference Paper

Rank minimization via online learning

  • Raghu Meka
  • Prateek Jain 0002
  • Constantine Caramanis
  • Inderjit S. Dhillon

Minimum rank problems arise frequently in machine learning applications and are notoriously difficult to solve due to the non-convex nature of the rank objective. In this paper, we present the first online learning approach for the problem of rank minimization of matrices over polyhedral sets. In particular, we present two online learning algorithms for rank minimization - our first algorithm is a multiplicative update method based on a generalized experts framework, while our second algorithm is a novel application of the online convex programming framework (Zinkevich, 2003). In the latter, we flip the role of the decision maker by making the decision maker search over the constraint space instead of feasible points, as is usually the case in online convex programming. A salient feature of our online learning approach is that it allows us to give provable approximation guarantees for the rank minimization problem over polyhedral sets. We demonstrate the effectiveness of our methods on synthetic examples, and on the real-life application of low-rank kernel learning.