Arrow Research search

Author name cluster

Daniel M. Kane

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

66 papers
2 author rows

Possible papers

66

ICML Conference 2025 Conference Paper

Batch List-Decodable Linear Regression via Higher Moments

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Sushrut Karmalkar
  • Sihan Liu
  • Thanasis Pittas

We study the task of list-decodable linear regression using batches, recently introduced by Das et al. 2023. . In this setting, we are given $m$ batches with each batch containing $n$ points in $\mathbb R^d$. A batch is called clean if the points it contains are i. i. d. samples from an unknown linear regression distribution. For a parameter $\alpha \in (0, 1/2)$, an unknown $\alpha$-fraction of the batches are clean and no assumptions are made on the remaining batches. The goal is to output a small list of vectors at least one of which is close to the true regressor vector in $\ell_2$-norm. Das et al. 2023 gave an efficient algorithm for this task, under natural distributional assumptions, with the following guarantee. Under the assumption that the batch size satisfies $n \geq \tilde{\Omega}(\alpha^{-1})$ and the total number of batches is $m = \text{poly}(d, n, 1/\alpha)$, their algorithm runs in polynomial time and outputs a list of $O(1/\alpha^2)$ vectors at least one of which is $\tilde{O}(\alpha^{-1/2}/\sqrt{n})$ close to the target regressor. Here we design a new polynomial-time algorithm for this task with significantly stronger guarantees under the assumption that the low-degree moments of the covariates distribution are Sum-of-Squares (SoS) certifiably bounded. Specifically, for any constant $\delta>0$, as long as the batch size is $n \geq \Omega_{\delta}(\alpha^{-\delta})$ and the degree-$\Theta(1/\delta)$ moments of the covariates are SoS certifiably bounded, our algorithm uses $m = \text{poly}((dn)^{1/\delta}, 1/\alpha)$ batches, runs in polynomial-time, and outputs an $O(1/\alpha)$-sized list of vectors one of which is $O(\alpha^{-\delta/2}/\sqrt{n})$ close to the target. That is, our algorithm substantially improves both the minimum batch size and the final error guarantee, while achieving the optimal list size. Our approach leverages higher-order moment information by carefully combining the SoS paradigm interleaved with an iterative method and a novel list pruning procedure for this setting. In the process, we give an SoS proof of the Marcinkiewicz-Zygmund inequality that may be of broader applicability.

ICML Conference 2025 Conference Paper

Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination

  • Ilias Diakonikolas
  • Giannis Iakovidis
  • Daniel M. Kane
  • Thanasis Pittas

We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in $\mathbb{R}^d$ generated i. i. d. via the following process. For a parameter $\alpha<1/2$, the $i$-th sample $x_i$ is obtained as follows: with probability $1-\alpha$, $x_i$ is drawn from $\mathcal{N}(\mu, I)$, where $\mu \in \mathbb{R}^d$ is the target mean; and with probability $\alpha$, $x_i$ is drawn from $\mathcal{N}(z_i, I)$, where $z_i$ is unknown and potentially arbitrary. Prior work characterized the information-theoretic limits of this task. Specifically, it was shown that— in contrast to Huber contamination— in the presence of mean-shift contamination consistent estimation is possible. On the other hand, all known robust estimators in the mean-shift model have running times exponential in the dimension. Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. In particular, our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy. Conceptually, our result contributes to a growing body of work that studies inference with respect to natural noise models lying in between fully adversarial and random settings.

FOCS Conference 2025 Conference Paper

Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

  • Ilias Diakonikolas
  • Daniel M. Kane

We study the general task of learning latent-variable models on ℝ d with k hidden parameters. A common technique to address this task algorithmically is (some version of) the method of moments. Unfortunately, moment-based approaches are often hampered by the fact that the moment tensors of super-constant degree cannot even be written down in polynomial time. Motivated by such learning applications, we develop a general efficient algorithm for implicit moment tensor computation. Roughly speaking, our algorithm computes in poly(d, k) time a succinct approximate description of tensors of the form ${M_m} = \sum\nolimits_{i = 1}^k {{w_i}} v_i^{ \otimes m}$, for w i ∈ ℝ + —even for m = ω(1)—assuming that there exists an unbiased estimator for M m with small variance that takes an appropriately nice form. Our framework broadly generalizes, both conceptually and technically, the work of [1] which developed an efficient algorithm for the specific moment tensors that arise in the task of clustering mixtures of spherical Gaussians. By leveraging our implicit moment estimation algorithm, we obtain the first poly(d, k)-time learning algorithms for the following classical latent-variable models—thereby resolving or making significant progress towards a number of important open problems in the literature. • Mixtures of Linear Regressions Given i. i. d. samples (x, y) with x ∼ N(0, I) and such that the joint distribution on (x, y) is an unknown k-mixture of linear regressions on ℝ d+1 corrupted with Gaussian noise, the goal is to learn the underlying distribution in total variation distance. We give a poly(d, k, 1/ϵ)-time algorithm for this task, where ϵ is the desired error. The previously best algorithm has super-polynomial complexity in k. • Mixtures of Spherical Gaussians Given i. i. d. samples from a k-mixture of identity covariance Gaussians on ℝ d, the goal is to learn the target mixture. For density estimation, we give a poly(d, k, 1/ ϵ)-time learning algorithm, where ϵ is the desired total variation error, under the condition that the means lie in a ball of radius $O(\sqrt {\log k} )$. Prior algorithms incur super-polynomial complexity in k. For parameter estimation, we give a poly(d, k, 1/ ϵ)-time algorithm where ϵ is the target accuracy, under the optimal mean separation of Ω(log 1/2 (k/ϵ)) and the condition that the largest distance is comparable to the smallest. Prior polynomial-time parameter estimation algorithms require separation Ω(log 1/2+c (k/ϵ)), for c > 0. • Positive Linear Combinations of Non-Linear Activations Given i. i. d. samples (x, y) with x ∼ N(0, I) and y = F(x), where F is a positive linear combination of k reasonable non-linear activations on ℝ d, the goal is to learn the target function in L 2 -norm. Our main result is a general algorithm for this task with complexity poly(d, k)g(ϵ), where ϵ is the desired error and the function g depends on the Hermite concentration of the target class of functions. Specifically, for positive linear combinations of ReLU activations, our algorithm has complexity poly(d, k)2 poly(1/ϵ). This is the first algorithm for this class that runs in poly(d, k) time for sub-constant values of ϵ = o k, d (1). Finally, for positive linear combinations of cosine activations with bounded frequency, our algorithm runs in poly(d, k, 1/ ϵ) time.

ICML Conference 2025 Conference Paper

On Fine-Grained Distinct Element Estimation

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Jasper C. H. Lee
  • Thanasis Pittas
  • David P. Woodruff
  • Samson Zhou

We study the problem of distributed distinct element estimation, where $\alpha$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $\Theta\left(\alpha\log n+\frac{\alpha}{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in practice. We introduce a new parameterization based on the number $C = \frac{\beta}{\varepsilon^2}$ of pairwise collisions, i. e. , instances where the same element appears on multiple servers, and design a protocol that uses only $O\left(\alpha\log n\log\log n+\frac{\sqrt{\beta}}{\varepsilon^2} \log n\right)$ bits, breaking previous lower bounds when $C$ is small. We further improve our algorithm under assumptions on the number of distinct elements or collisions and provide matching lower bounds in all regimes, establishing $C$ as a tight complexity measure for the problem. Finally, we consider streaming algorithms for distinct element estimation parameterized by the number of items with frequency larger than $1$. Overall, our results offer insight into why statistical problems with known hardness results can be efficiently solved in practice.

ICML Conference 2025 Conference Paper

On Learning Parallel Pancakes with Mostly Uniform Weights

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Sushrut Karmalkar
  • Jasper C. H. Lee
  • Thanasis Pittas

We study the complexity of learning $k$-mixtures of Gaussians ($k$-GMMs) on $\mathbb R^d$. This task is known to have complexity $d^{\Omega(k)}$ in full generality. To circumvent this exponential lower bound on the number of components, research has focused on learning families of GMMs satisfying additional structural properties. A natural assumption posits that the component weights are not exponentially small and that the components have the same unknown covariance. Recent work gave a $d^{O(\log(1/w_{\min}))}$-time algorithm for this class of GMMs, where $w_{\min}$ is the minimum weight. Our first main result is a Statistical Query (SQ) lower bound showing that this quasi-polynomial upper bound is essentially best possible, even for the special case of uniform weights. Specifically, we show that it is SQ-hard to distinguish between such a mixture and the standard Gaussian. We further explore how the distribution of weights affects the complexity of this task. Our second main result is a quasi-polynomial upper bound for the aforementioned testing task when most of the weights are uniform while a small fraction of the weights are potentially arbitrary.

FOCS Conference 2025 Conference Paper

PTF Testing Lower Bounds for Non-Gaussian Component Analysis

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Sihan Liu
  • Thanasis Pittas

This work studies information-computation gaps for statistical problems. A common approach for providing evidence of such gaps is to show sample complexity lower bounds (that are stronger than the information-theoretic optimum) against natural models of computation. A popular such model in the literature is the family of low-degree polynomial tests. While these tests are defined in such a way that make them easy to analyze, the class of algorithms that they rule out is somewhat restricted. An important goal in this context has been to obtain lower bounds against the stronger and more natural class of low-degree Polynomial Threshold Function (PTF) tests, i. e. , any test that can be expressed as comparing some low-degree polynomial of the data to a threshold. Proving lower bounds against PTF tests has turned out to be challenging. Indeed, we are not aware of any non-trivial PTF testing lower bounds in the literature. In this paper, we establish the first non-trivial PTF testing lower bounds for a range of statistical tasks. Specifically, we prove a near-optimal PTF testing lower bound for Non-Gaussian Component Analysis (NGCA). Our NGCA lower bound implies similar lower bounds for a number of other statistical problems. Our proof leverages a connection to recent work on pseudorandom generators for PTFs and recent techniques developed in that context. At the technical level, we develop several tools of independent interest, including novel structural results for analyzing the behavior of low-degree polynomials restricted to random directions.

FOCS Conference 2025 Conference Paper

Robust Learning of Multi-index Models via Iterative Subspace Approximation

  • Ilias Diakonikolas
  • Giannis Iakovidis
  • Daniel M. Kane
  • Nikos Zarifis

We study the task of learning Multi-Index Models (MIMs) in the presence of label noise under the Gaussian distribution. A K-MIM on ℝ d is any function f that only depends on a K-dimensional subspace, i. e. , f(x) = g(Wx) for a link function g on ℝ K and a K × d matrix W. We consider a class of well-behaved MIMs with finite ranges that satisfy certain regularity properties. Our main contribution is a general noise-tolerant learning algorithm for this class whose complexity is qualitatively optimal in the Statistical Query (SQ) model. At a high-level, our algorithm attempts to iteratively construct better approximations to the defining subspace by computing low-degree moments of our function conditional on its projection to the subspace computed thus far, and adding directions with relatively large empirical moments. For well-behaved MIMs, we show that this procedure efficiently finds a subspace V so that f(x) is close to a function of the projection of x onto V, which can then be found by brute-force. Conversely, for functions for which these conditional moments do not necessarily help in finding better subspaces, we prove an SQ lower bound providing evidence that no efficient algorithm exists. As concrete applications of our general algorithm, we provide significantly faster noise-tolerant learners for two well-studied concept classes: •Multiclass Linear Classifiers A multiclass linear classifier is any function f: ℝ d → [K] of the form f(x) = argmax i∈[K] (w (i) •x+t i ), where w (i) ∈ ℝ d and t i ∈ ℝ. We give a constant-factor approximate agnostic learner for this class, i. e. , an algorithm that achieves 0-1 error O(OPT)+ϵ. Our algorithm has sample complexity N = O(d)2 poly(K/ϵ) and computational complexity poly(N). This is the first constant-factor agnostic learner for this class whose complexity is a fixed-degree polynomial in d. In the agnostic model, it was previously known that achieving error OPT+ϵ requires time d poly(1/ϵ), even for K = 2. Perhaps surprisingly, we prove an SQ lower bound showing that achieving error OPT+ϵ, for ϵ = 1/poly(K), incurs complexity d Ω(K) even for the simpler case of Random Classification Noise. •Intersections of Halfspaces An intersection of K halfspaces is any function f: ℝ d → {±1} such that there exist K halfspaces h i (x) with f(x) = 1 if and only if h i (x) = 1 for all i ∈ [K]. We give an approximate agnostic learner for this class achieving 0-1 error $K\tilde O({\text{OPT}}) + \varepsilon $. Our algorithm has sample complexity N = O(d 2 )2 poly(K/ϵ) and computational complexity poly(N). This is the first agnostic learner for this class with near-optimal dependence on OPT in its error, whose complexity is a fixed-degree polynomial in d. Previous algorithms either achieved significantly worse error guarantees, or incurred d poly(1/ϵ) time (even for K = 2). Furthermore, we show that in the presence of random classification noise, the complexity of our algorithm is significantly better, scaling polynomially with 1/ϵ.

NeurIPS Conference 2024 Conference Paper

Active Learning of General Halfspaces: Label Queries vs Membership Queries

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Mingchen Ma

We study the problem of learning general (i. e. , not necessarily homogeneous) halfspaces under the Gaussian distribution on $\mathbb{R}^d$ in the presence of some form of query access. In the classical pool-based active learning model, where the algorithm isallowed to make adaptive label queries to previously sampled points, we establish a strong information-theoretic lower bound ruling out non-trivialimprovements over the passive setting. Specifically, we show thatany active learner requires label complexity of $\tilde{\Omega}(d/(\log(m)\epsilon))$, where $m$ is the number of unlabeled examples. Specifically, to beat the passive label complexity of $\tilde{O}(d/\epsilon)$, an active learner requires a pool of $2^{\mathrm{poly}(d)}$ unlabeled samples. On the positive side, we show that this lower bound can be circumvented with membership query access, even in the agnostic model. Specifically, we give a computationally efficient learner with query complexity of $\tilde{O}(\min(1/p, 1/\epsilon) + d\mathrm{polylog}(1/\epsilon))$achieving error guarantee of $O(\mathrm{opt}+\epsilon)$. Here $p \in [0, 1/2]$ is the bias and $\mathrm{opt}$ is the 0-1 loss of the optimal halfspace. As a corollary, we obtain a strong separation between the active and membership query models. Taken together, our results characterize the complexity of learning general halfspaces under Gaussian marginals in these models.

FOCS Conference 2024 Conference Paper

Agnostically Learning Multi-Index Models with Queries

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Vasilis Kontonis
  • Christos Tzamos
  • Nikos Zarifis

We study the power of query access for the fundamental task of agnostic learning under the Gaussian distribution. In the agnostic model, no assumptions are made on the labels of the examples and the goal is to compute a hypothesis that is competitive with the best-fit function in a known class, i. e. , it achieves error opt $+\epsilon$, where opt is the error of the best function in the class. We focus on a general family of Multi-Index Models (MIMs), which are d-variate functions that depend only on few relevant directions, i. e. , have the form $g$ (Wx) for an unknown link function $g$ and a $k\times d$ matrix W. Multi-index models cover a wide range of commonly studied function classes, including real-valued function classes such as constant-depth neural networks with ReLU activations, and Boolean concept classes such as intersections of halfspaces. Our main result shows that query access gives significant runtime improvements over random examples for agnostically learning both real-valued and Boolean-valued MIMs. Under standard regularity assumptions for the link function (namely, bounded variation or surface area), we give an agnostic query learner for MIMs with running time $O(k)^{\text{poly}(1/\epsilon}$ ) poly $(d)$. In contrast, algorithms that rely only on random labeled examples inherently require $d^{\text{poly}(1/\epsilon}$ samples and runtime, even for the basic problem of agnostically learning a single ReLU or a halfspace. As special cases of our general approach, we obtain the following results: •For the class of depth-ℓ, width-S ReLU networks on $\mathbb{R}^{d}$, our agnostic query learner runs in time poly $(d)2^{\text{poly}(\ell S/\epsilon)}$. This bound qualitatively matches the runtime of an algorithm by [1] for the realizable PAC setting with random examples. •For the class of arbitrary intersections of $k$ halfspaces on $\mathbb{R}^{d}$, our agnostic query learner runs in time poly $(d)2^{\text{poly}(\log(k)/\epsilon)}$. Prior to our work, no improvement over the agnostic PAC model complexity (without queries) was known, even for the case of a single halfspace. In both these settings, we provide evidence that the $2^{\text{poly}(1/\epsilon)}$ runtime dependence is required for proper query learners, even for agnosticallylearning a single ReL U or halfspace. Our algorithmic result establishes a strong computational separation between the agnostic PAC and the agnostic PAC+Query models under the Gaussian distribution for a range of natural function classes. Prior to our work, no such separation was known for any natural concept class - even for the case of a single halfspace, for which it was an open problem posed by Feldman [2]. Our results are enabled by a general dimension-reduction technique that leverages query access to estimate gradients of (a smoothed version of) the underlying label function.

FOCS Conference 2024 Conference Paper

Replicability in High Dimensional Statistics

  • Max Hopkins
  • Russell Impagliazzo
  • Daniel M. Kane
  • Sihan Liu
  • Christopher Ye 0001

The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced the notion of replicable learning algorithms, and gave basic procedures for 1-dimensional tasks including statistical queries. In this work, we study the computational and statistical cost of replicability for several fundamental high dimensional statistical tasks, including multi-hypothesis testing and mean estimation. Our main contribution establishes a computational and statistical equivalence between optimal replicable algorithms and high dimensional isoperimetric tilings. As a consequence, we obtain matching sample complexity upper and lower bounds for replicable mean estimation of distributions with bounded covariance, resolving an open problem of [Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sivakumar, and Sorrell, STOC 2023] and for the $N$ -Coin Problem, resolving a problem of [Karbasi, Velegkas, Yang, and Zhou, NeurIPS 2023] up to log factors. While our equivalence is computational, allowing us to shave $\log$ factors in sample complexity from the best known efficient algorithms, efficient isoperimetric tilings are not known. To circumvent this, we introduce several relaxed paradigms that do allow for sample and computationally efficient algorithms, including allowing pre-processing, adaptivity, and approximate replicability. In these cases we give efficient algorithms matching or beating the best known sample complexity for mean estimation and the coin problem, including a generic procedure that reduces the standard quadratic overhead of replicability to linear in expectation.

ICML Conference 2024 Conference Paper

Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Sushrut Karmalkar
  • Ankit Pensia
  • Thanasis Pittas

We study Gaussian sparse estimation tasks in Huber’s contamination model with a focus on mean estimation, PCA, and linear regression. For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees, within constant factors. All prior efficient algorithms for these tasks incur quantitatively suboptimal error. Concretely, for Gaussian robust $k$-sparse mean estimation on $\mathbb{R}^d$ with corruption rate $\epsilon>0$, our algorithm has sample complexity $(k^2/\epsilon ^2)\mathrm{polylog}(d/\epsilon)$, runs in sample polynomial time, and approximates the target mean within $\ell_2$-error $O(\epsilon)$. Previous efficient algorithms inherently incur error $\Omega(\epsilon \sqrt{\log(1/\epsilon)})$. At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.

ICML Conference 2023 Conference Paper

Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Lisheng Ren

We study the task of agnostically learning halfspaces under the Gaussian distribution. Specifically, given labeled examples $(\\mathbf{x}, y)$ from an unknown distribution on $\\mathbb{R}^n \\times \\{\pm 1 \\}$, whose marginal distribution on $\\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\\mathrm{OPT}+\\epsilon$, where $\\mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. We prove a near-optimal computational hardness result for this task, under the widely believed sub-exponential time hardness of the Learning with Errors (LWE) problem. Prior hardness results are either qualitatively suboptimal or apply to restricted families of algorithms. Our techniques extend to yield near-optimal lower bounds for related problems, including ReLU regression.

ICML Conference 2023 Conference Paper

Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Ankit Pensia
  • Thanasis Pittas

We study principal component analysis (PCA), where given a dataset in $\mathbb R^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA. Recent work has developed computationally-efficient algorithms for robust PCA that either take super-linear time or have sub-optimal error guarantees. Our main contribution is to develop a nearly linear time algorithm for robust PCA with near-optimal error guarantees. We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.

STOC Conference 2022 Conference Paper

Learning general halfspaces with general Massart noise under the Gaussian distribution

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Vasilis Kontonis
  • Christos Tzamos
  • Nikos Zarifis

We study the problem of PAC learning halfspaces on ℝ d with Massart noise under the Gaussian distribution. In the Massart model, an adversary is allowed to flip the label of each point x with unknown probability η( x ) ≤ η, for some parameter η ∈ [0,1/2]. The goal is to find a hypothesis with misclassification error of OPT + є, where OPT is the error of the target halfspace. This problem had been previously studied under two assumptions: (i) the target halfspace is homogeneous (i.e., the separating hyperplane goes through the origin), and (ii) the parameter η is strictly smaller than 1/2. Prior to this work, no nontrivial bounds were known when either of these assumptions is removed. We study the general problem and establish the following: [leftmargin = *] For η <1/2, we give a learning algorithm for general halfspaces with sample and computational complexity d O η (log(1/γ)) poly (1/є), where γ max{є, min{ Pr [ f ( x ) = 1], Pr [ f ( x ) = −1]} } is the “bias” of the target halfspace f . Prior efficient algorithms could only handle the special case of γ = 1/2. Interestingly, we establish a qualitatively matching lower bound of d Ω(log(1/γ)) on the complexity of any Statistical Query (SQ) algorithm. For η = 1/2, we give a learning algorithm for general halfspaces with sample and computational complexity O є (1) d O (log(1/є)) . This result is new even for the subclass of homogeneous halfspaces; prior algorithms for homogeneous Massart halfspaces provide vacuous guarantees for η=1/2. We complement our upper bound with a nearly-matching SQ lower bound of d Ω(log(1/є) ) , which holds even for the special case of homogeneous halfspaces. Taken together, our results qualitatively characterize the complexity of learning general halfspaces with general Massart noise under Gaussian marginals. Our techniques rely on determining the existence (or non-existence) of low-degree polynomials whose expectations distinguish Massart halfspaces from random noise.

ICML Conference 2022 Conference Paper

Streaming Algorithms for High-Dimensional Robust Statistics

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Ankit Pensia
  • Thanasis Pittas

We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust statistics tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber’s contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.

SODA Conference 2021 Conference Paper

Robust Learning of Mixtures of Gaussians

  • Daniel M. Kane

We resolve one of the major outstanding problems in robust statistics. In particular, if X is an evenly weighted mixture of two arbitrary d -dimensional Gaussians, we devise a polynomial time algorithm that given access to samples from X an ∊ -fraction of which have been adversarially corrupted, learns X to error poly( ∊ ) in total variation distance.

NeurIPS Conference 2020 Conference Paper

Outlier Robust Mean Estimation with Subgaussian Rates via Stability

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Ankit Pensia

We study the problem of outlier robust high-dimensional mean estimation under a bounded covariance assumption, and more broadly under bounded low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics literature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates. Previous analyses of these algorithms gave significantly suboptimal rates. As a corollary of our approach, we obtain the first computationally efficient algorithm for outlier robust mean estimation with subgaussian rates under a bounded covariance assumption.

FOCS Conference 2020 Conference Paper

Outlier-Robust Clustering of Gaussians and Other Non-Spherical Mixtures

  • Ainesh Bakshi
  • Ilias Diakonikolas
  • Samuel B. Hopkins 0001
  • Daniel M. Kane
  • Sushrut Karmalkar
  • Pravesh K. Kothari

We give the first outlier-robust efficient algorithm for clustering a mixture of k statistically separated d - dimensional Gaussians ( k-GMMs). Concretely, our algorithm takes input an ε-corrupted sample from a k-GMM and outputs an approximate clustering that misclassifies at most k O(k) (ε+η) fraction of the points whenever every pair of mixture components are separated by 1-exp(-poly(k/η)) in total variation distance. This is the statistically weakest possible notion of separation and allows, for e. g. , clustering of mixtures with components with the same mean with covariances differing in a single unknown direction or separated in Frobenius distance. The running time of our algorithm is d poly(k/η). Such results were not known prior to our work, even for k=2. More generally, our algorithms succeed for mixtures of any distribution that satisfies two well-studied analytic assumptions - sum-of-squares certifiable hypercontractivity and anti-concentration. As an immediate corollary, they extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the d-dimensional unit sphere. Even the information theoretic clusterability of separated distributions satisfying our analytic assumptions was not known and is likely to be of independent interest. Our algorithms build on the recent flurry of work relying on certifiable anti-concentration first introduced in [1], [2]. Our techniques expand the sum-of-squares toolkit to show robust certifiability of TV-separated Gaussian clusters in data. This involves giving a low-degree sum-of-squares proof of statements that relate parameter (i. e. mean and covariances) distance to total variation distance by relying only on hypercontractivity and anti-concentration.

FOCS Conference 2020 Conference Paper

Point Location and Active Learning: Learning Halfspaces Almost Optimally

  • Max Hopkins
  • Daniel M. Kane
  • Shachar Lovett
  • Gaurav Mahajan

Given a finite set X ⊂ R d and a binary linear classifier c: R d → {0, 1}, how many queries of the form c(x) are required to learn the label of every point in X? Known as point location, this problem has inspired over 35 years of research in the pursuit of an optimal algorithm. Building on the prior work of Kane, Lovett, and Moran (ICALP 2018), we provide the first nearly optimal solution, a randomized linear decision tree of depth Õ(dlog(|X|)), improving on the previous best of Õ(d 2 log(|X|)) from Ezra and Sharir (Discrete and Computational Geometry, 2019). As a corollary, we also provide the first nearly optimal algorithm for actively learning halfspaces in the membership query model. En route to these results, building on the work of Carlen, Lieb, and Loss (J. Geometric Analysis 2004), as well as Dvir, Saraf, and Wigderson (STOC 2014), we prove a novel characterization of Barthe's Theorem (Inventiones Mathematicae, 1998) of independent interest. In particular, we show that X may be transformed into approximate isotropic position if and only if there exists no k-dimensional subspace with more than a k/d-fraction of X, and provide a similar characterization for exact isotropic position. The below is an extended abstract. The full work can be found at https: //arxiv. org/abs/2004. 11380.

FOCS Conference 2020 Conference Paper

Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models

  • Ilias Diakonikolas
  • Daniel M. Kane

Let $V$ be any vector space of multivariate degree- $d$ homogeneous polynomials with co-dimension at most $k$, and $S$ be the set of points where all polynomials in $V$ nearly vanish. We establish a qualitatively optimal upper bound on the size of $\epsilon$ -covers for $S$, in the $\ell_{2}$ -norm. Roughly speaking, we show that there exists an $\epsilon$ -cover for $S$ of cardinality $M=(k/\epsilon)^{O_{d}(k^{1/d})}$. Our result is constructive yielding an algorithm to compute such an $\epsilon$ -cover that runs in time $\text{poly}(M)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for $k$ -mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with $k$ hidden units (under the Gaussian distribution), density and parameter estimation for $k$ -mixtures of linear regressions (with Gaussian covariates), and parameter estimation for $k$ -mixtures of hyperplanes. Our algorithms run in time quasi-polynomial in the parameter $k$. Previous algorithms for these problems had running times exponential in $k^{\Omega(1)}$. At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.

NeurIPS Conference 2020 Conference Paper

The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Pasin Manurangsi

We study the computational complexity of adversarially robust proper learning of halfspaces in the distribution-independent agnostic PAC model, with a focus on $L_p$ perturbations. We give a computationally efficient learning algorithm and a nearly matching computational hardness result for this problem. An interesting implication of our findings is that the $L_{\infty}$ perturbations case is provably computationally harder than the case $2 \leq p < \infty$.

ICML Conference 2019 Conference Paper

Sever: A Robust Meta-Algorithm for Stochastic Optimization

  • Ilias Diakonikolas
  • Gautam Kamath 0001
  • Daniel M. Kane
  • Jerry Li 0001
  • Jacob Steinhardt
  • Alistair Stewart

In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable – beyond running the base learner itself, it only requires computing the top singular vector of a certain n{\texttimes}d matrix. We apply Sever on a drug design dataset and a spam classification dataset, and find that in both cases it has substantially greater robustness than several baselines. On the spam dataset, with 1% corruptions, we achieved 7. 4% test error, compared to 13. 4%-20. 5% for the baselines, and 3% error on the uncorrupted dataset. Similarly, on the drug design dataset, with 10% corruptions, we achieved 1. 42 mean-squared error test error, compared to 1. 51-2. 33 for the baselines, and 1. 23 error on the uncorrupted dataset.

STOC Conference 2018 Conference Paper

Learning geometric concepts with nasty noise

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Alistair Stewart

We study the efficient learnability of geometric concept classes — specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces — when a fraction of the training data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. At the core of our results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. Our robust approximation algorithm for the Chow parameters provides near-optimal error guarantees for a range of distribution families satisfying mild concentration bounds and moment conditions. At the technical level, this algorithm employs an iterative “spectral” technique for outlier detection and removal inspired by recent work in robust unsupervised learning, which makes essential use of low-degree multivariate polynomials. Our robust learning algorithm for low-degree PTFs provides dimension-independent error guarantees for a class of tame distributions, including Gaussians and, more generally, any logconcave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, using a refinement of the localization technique, we give a polynomial-time algorithm that achieves a near-optimal error of O (є), where є is the noise rate. Our robust learning algorithm for intersections of halfspaces proceeds by projecting down to an appropriate low-dimensional subspace. Its correctness makes essential use of a novel robust inverse independence lemma that is of independent interest.

STOC Conference 2018 Conference Paper

Near-optimal linear decision trees for k-SUM and related problems

  • Daniel M. Kane
  • Shachar Lovett
  • Shay Moran

We construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k , we construct linear decision trees that solve the k -SUM problem on n elements using O ( n log 2 n ) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k -subsets; when viewed as linear queries, comparison queries are 2 k -sparse and have only {−1,0,1} coefficients. We give similar constructions for sorting sumsets A + B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms. Our constructions are based on the notion of “inference dimension”, recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine learning and discrete geometry, which goes back to the discovery of the VC dimension.

NeurIPS Conference 2018 Conference Paper

Sharp Bounds for Generalized Uniformity Testing

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Alistair Stewart

We study the problem of generalized uniformity testing of a discrete probability distribution: Given samples from a probability distribution p over an unknown size discrete domain Ω, we want to distinguish, with probability at least 2/3, between the case that p is uniform on some subset of Ω versus ε-far, in total variation distance, from any such uniform distribution. We establish tight bounds on the sample complexity of generalized uniformity testing. In more detail, we present a computationally efficient tester whose sample complexity is optimal, within constant factors, and a matching worst-case information-theoretic lower bound. Specifically, we show that the sample complexity of generalized uniformity testing is Θ(1/(ε^(4/3) ||p||_3) + 1/(ε^2 ||p||_2 )).

STOC Conference 2018 Conference Paper

Testing conditional independence of discrete distributions

  • Clément L. Canonne
  • Ilias Diakonikolas
  • Daniel M. Kane
  • Alistair Stewart

We study the problem of testing *conditional independence* for discrete distributions. Specifically, given samples from a discrete random variable ( X , Y , Z ) on domain [ℓ 1 ]×[ℓ 2 ] × [ n ], we want to distinguish, with probability at least 2/3, between the case that X and Y are conditionally independent given Z from the case that ( X , Y , Z ) is є-far, in ℓ 1 -distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with important applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics community for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing and in particular no tester with *sublinear* sample complexity is known, even for the important special case that the domains of X and Y are binary. The main algorithmic result of this work is the first conditional independence tester with sublinear sample complexity. To complement our upper bound, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings. Specifically, for the prototypical setting when ℓ 1 , ℓ 2 = O (1), we show that the sample complexity of testing conditional independence (upper bound and matching lower bound) is [complex formula not displayed] We also achieve sublinear sample complexity for the general case of arbitrary ℓ 1 , ℓ 2 , and n . To obtain our tester, we employ a variety of tools, including (1) a subtle weighted adaptation of the ”flattening” technique, and (2) the design and analysis of an optimal (unbiased) estimator for the following statistical problem of independent interest: Given a degree- d polynomial Q ∶ℝ n → ℝ and sample access to a distribution p over [ n ], estimate Q ( p 1 , …, p n ) up to small additive error. Obtaining tight variance analyses for specific estimators of this form has been a major technical hurdle in distribution testing. As an important contribution of this work, we develop a general theory providing tight variance bounds for *all* such estimators.

STOC Conference 2017 Conference Paper

A polynomial restriction lemma with applications

  • Valentine Kabanets
  • Daniel M. Kane
  • Zhenjian Lu

A polynomial threshold function (PTF) of degree d is a boolean function of the form f = sgn ( p ), where p is a degree- d polynomial, and sgn is the sign function. The main result of the paper is an almost optimal bound on the probability that a random restriction of a PTF is not close to a constant function, where a boolean function g is called δ-close to constant if, for some v ε{1,-1}, we have g ( x )= v for all but at most δ fraction of inputs. We show for every PTF f of degree d ≥ 1, and parameters 0<δ, r ≤ 1/16, that Pr ∾ R r [ f ρ is not δ-close to constant] ≤ √ #183;(log r -1 · log δ-1 ) O ( d 2 ) , where ρ ∾ R r is a random restriction leaving each variable, independently, free with probability r , and otherwise assigning it 1 or -1 uniformly at random. In fact, we show a more general result for random block restrictions: given an arbitrary partitioning of input variables into m blocks, a random block restriction picks a uniformly random block ℓΕ [ m ] and assigns 1 or -1, uniformly at random, to all variable outside the chosen block ℓ. We prove the Block Restriction Lemma saying that a PTF f of degree d becomes δ-close to constant when hit with a random block restriction, except with probability at most m -1/2 #183; (log m #183; logδ -1 ) O ( d 2 ) . As an application of our Restriction Lemma, we prove lower bounds against constant-depth circuits with PTF gates of any degree 1≤ d ≪ √log n /loglog n , generalizing the recent bounds against constant-depth circuits with linear threshold gates (LTF gates) proved by Kane and Williams ( STOC , 2016) and Chen, Santhanam, and Srinivasan ( CCC , 2016). In particular, we show that there is an n -variate boolean function F n Ε P such that every depth-2 circuit with PTF gates of degree d ≥ 1 that computes F n must have at least ( n 3/2+1/ d )#183; (log n ) - O ( d 2 ) wires. For constant depths greater than 2, we also show average-case lower bounds for such circuits with super-linear number of wires. These are the first super-linear bounds on the number of wires for circuits with PTF gates. We also give short proofs of the optimal-exponent average sensitivity bound for degree- d PTFs due to Kane ( Computational Complexity , 2014), and the Littlewood-Offord type anticoncentration bound for degree- d multilinear polynomials due to Meka, Nguyen, and Vu ( Theory of Computing , 2016). Finally, we give derandomized versions of our Block Restriction Lemma and Littlewood-Offord type anticoncentration bounds, using a pseudorandom generator for PTFs due to Meka and Zuckerman ( SICOMP , 2013).

FOCS Conference 2017 Conference Paper

Active Classification with Comparison Queries

  • Daniel M. Kane
  • Shachar Lovett
  • Shay Moran
  • Jiapeng Zhang

We study an extension of active learning in which the learning algorithm may ask the annotator to compare the distances of two examples from the boundary of their label-class. For example, in a recommendation system application (say for restaurants), the annotator may be asked whether she liked or disliked a specific restaurant (a label query); or which one of two restaurants did she like more (a comparison query). We focus on the class of half spaces, and show that under natural assumptions, such as large margin or bounded bit-description of the input examples, it is possible to reveal all the labels of a sample of size n using approximately O(log n) queries. This implies an exponential improvement over classical active learning, where only label queries are allowed. We complement these results by showing that if any of these assumptions is removed then, in the worst case, Ω(n) queries are required. Our results follow from a new general framework of active learning with additional queries. We identify a combinatorial dimension, called the inference dimension, that captures the query complexity when each additional query is determined by O(1) examples (such as comparison queries, each of which is determined by the two compared examples). Our results for half spaces follow by bounding the inference dimension in the cases discussed above.

ICML Conference 2017 Conference Paper

Being Robust (in High Dimensions) Can Be Practical

  • Ilias Diakonikolas
  • Gautam Kamath 0001
  • Daniel M. Kane
  • Jerry Li 0001
  • Ankur Moitra
  • Alistair Stewart

Robust estimation is much more challenging in high-dimensions than it is in one-dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.

FOCS Conference 2017 Conference Paper

Robust Polynomial Regression up to the Information Theoretic Limit

  • Daniel M. Kane
  • Sushrut Karmalkar
  • Eric Price 0001

We consider the problem of robust polynomial regression, where one receives samples that are usually within a small additive error of a target polynomial, but have a chance of being arbitrary adversarial outliers. Previously, it was known how to efficiently estimate the target polynomial only when the outlier probability was subconstant in the degree of the target polynomial. We give an algorithm that works for the entire feasible range of outlier probabilities, while simultaneously improving other parameters of the problem. We complement our algorithm, which gives a factor 2 approximation, with impossibility results that show, for example, that a 1. 09 approximation is impossible even with infinitely many samples.

FOCS Conference 2017 Conference Paper

Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Alistair Stewart

We describe a general technique that yields the first Statistical Query lower bounds for a range of fundamental high-dimensional learning problems involving Gaussian distributions. Our main results are for the problems of (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown Gaussian distribution. For each of these problems, we show a super-polynomial gap between the (information-theoretic) sample complexity and the computational complexity of any Statistical Query algorithm for the problem. Statistical Query (SQ) algorithms are a class of algorithms that are only allowed to query expectations of functions of the distribution rather than directly access samples. This class of algorithms is quite broad: a wide range of known algorithmic techniques in machine learning are known to be implementable using SQs. Moreover, for the unsupervised learning problems studied in this paper, all known algorithms with non-trivial performance guarantees are SQ or are easily implementable using SQs. Our SQ lower bound for Problem (1) is qualitatively matched by known learning algorithms for GMMs. At a conceptual level, this result implies that - as far as SQ algorithms are concerned - the computational complexity of learning GMMs is inherently exponential in the dimension of the latent space - even though there is no such information-theoretic barrier. Our lower bound for Problem (2) implies that the accuracy of the robust learning algorithm in [29] is essentially best possible among all polynomial-time SQ algorithms. On the positive side, we also give a new (SQ) learning algorithm for Problem (2) achieving the information-theoretically optimal accuracy, up to a constant factor, whose running time essentially matches our lower bound. Our algorithm relies on a filtering technique generalizing [29] that removes outliers based on higher-order tensors. Our SQ lower bounds are attained via a unified moment-matching technique that is useful in other contexts and may be of broader interest. Our technique yields nearly-tight lower bounds for a number of related unsupervised estimation problems. Specifically, for the problems of (3) robust covariance estimation in spectral norm, and (4) robust sparse mean estimation, we establish a quadratic statistical- computational tradeoff for SQ algorithms, matching known upper bounds. Finally, our technique can be used to obtain tight sample complexity lower bounds for high-dimensional testing problems. Specifically, for the classical problem of robustly testing an unknown mean (known covariance) Gaussian, our technique implies an information-theoretic sample lower bound that scales linearly in the dimension. Our sample lower bound matches the sample complexity of the corresponding robust learning problem and separates the sample complexity of robust testing from standard (non-robust) testing. This separation is surprising because such a gap does not exist for the corresponding learning problem. problem.

FOCS Conference 2017 Conference Paper

The Independence Number of the Birkhoff Polytope Graph, and Applications to Maximally Recoverable Codes

  • Daniel M. Kane
  • Shachar Lovett
  • Sankeerth Rao

Maximally recoverable codes are codes designed for distributed storage which combine quick recovery from single node failure and optimal recovery from catastrophic failure. Gopalan et al [SODA 2017] studied the alphabet size needed for such codes in grid topologies and gave a combinatorial characterization for it. Consider a labeling of the edges of the complete bipartite graph K n, n with labels coming from F 2 d, that satisfies the following condition: for any simple cycle, the sum of the labels over its edges is nonzero. The minimal d where this is possible controls the alphabet size needed for maximally recoverable codes in n × n grid topologies. Prior to the current work, it was known that d is between log(n) 2 and n log n. We improve both bounds and show that d is linear in n. The upper bound is a recursive construction which beats the random construction. The lower bound follows by first relating the problem to the independence number of the Birkhoff polytope graph, and then providing tight bounds for it using the representation theory of the symmetric group.

FOCS Conference 2016 Conference Paper

A New Approach for Testing Properties of Discrete Distributions

  • Ilias Diakonikolas
  • Daniel M. Kane

We study problems in distribution property testing: Given sample access to one or more unknown discrete distributions, we want to determine whether they have some global property or are epsilon-far from having the property in L1 distance (equivalently, total variation distance, or "statistical distance"). In this work, we give a novel general approach for distribution testing. We describe two techniques: our first technique gives sample-optimal testers, while our second technique gives matching sample lower bounds. As a consequence, we resolve the sample complexity of a wide variety of testing problems. Our upper bounds are obtained via a modular reduction-based approach. Our approach yields optimal testers for numerous problemsby using a standard L2-identity tester as a black-box. Using this recipe, we obtain simple estimators for a wide range of problems, encompassing many problems previously studied in the TCS literature, namely: (1) identity testing to a fixed distribution, (2) closeness testing between two unknown distributions (with equal/unequal sample sizes), (3) independence testing (in any number of dimensions), (4) closeness testing for collections of distributions, and(5) testing histograms. For all of these problems, our testers are sample-optimal, up to constant factors. With the exception of (1), ours are the first sample-optimal testers for the corresponding problems. Moreover, our estimators are significantly simpler to state and analyze compared to previous results. As an important application of our reduction-based technique, we obtain the first adaptive algorithm for testing equivalence betweentwo unknown distributions. The sample complexity of our algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size -and is significantly better compared to the worst-case optimal L1-tester in many natural instances. Moreover, our technique naturally generalizes to other metrics beyond the L1-distance. As an illustration of its flexibility, we use it to obtain the first near-optimal equivalence testerunder the Hellinger distance. Our lower bounds are obtained via a direct information-theoretic approach: Given a candidate hard instance, our proof proceeds by boundingthe mutual information between appropriate random variables. While this is a classical method in information theory, prior to our work, it had not been used in this context. Previous lower bounds relied either on the birthday paradox, oron moment-matching and were thus restricted to symmetric properties. Our lower bound approach does not suffer from any such restrictions and gives tight sample lower bounds for the aforementioned problems.

FOCS Conference 2016 Conference Paper

Fourier-Sparse Interpolation without a Frequency Gap

  • Xue Chen 0001
  • Daniel M. Kane
  • Eric Price 0001
  • Zhao Song 0002

We consider the problem of estimating a Fourier-sparse signal from noisy samples, where the sampling is done over some interval [0, T] and the frequencies can be "off-grid". Previous methods for this problem required the gap between frequencies to be above 1/T, the threshold required to robustly identify individual frequencies. We show the frequency gap is not necessary to estimate the signal as a whole: for arbitrary k-Fourier-sparse signals under l2 bounded noise, we show how to estimate the signal with a constant factor growth of the noise and sample complexity polynomial in k and logarithmic in the bandwidth and signal-to-noise ratio. As a special case, we get an algorithm to interpolate degree d polynomials from noisy measurements, using O(d) samples and increasing the noise by a constant factor in l2.

FOCS Conference 2016 Conference Paper

Robust Estimators in High Dimensions without the Computational Intractability

  • Ilias Diakonikolas
  • Gautam Kamath 0001
  • Daniel M. Kane
  • Jerry Li 0001
  • Ankur Moitra
  • Alistair Stewart

We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an epsilon fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension dependent factors in their error guarantees. This raises the following question: Is high-dimensional agnostic distribution learning even possible, algorithmically? In this work, we obtain the first computationally efficient algorithms for agnostically learning several fundamental classes of high-dimensional distributions: (1) a single Gaussian, (2) a product distribution on the hypercube, (3) mixtures of two product distributions (under a natural balancedness condition), and (4) mixtures of k Gaussians with identical spherical covariances. All our algorithms achieve error that is independent of the dimension, and in many cases depends nearly-linearly on the fraction of adversarially corrupted samples. Moreover, we develop a general recipe for detecting and correcting corruptions in high-dimensions, that may be applicable to many other problems.

STOC Conference 2016 Conference Paper

Super-linear gate and super-quadratic wire lower bounds for depth-two and depth-three threshold circuits

  • Daniel M. Kane
  • R. Ryan Williams

In order to formally understand the power of neural computing, we first need to crack the frontier of threshold circuits with two and three layers, a regime that has been surprisingly intractable to analyze. We prove the first super-linear gate lower bounds and the first super-quadratic wire lower bounds for depth-two linear threshold circuits with arbitrary weights, and depth-three majority circuits computing an explicit function. (1) We prove that for all ε ≪ √log( n )/ n , the linear-time computable Andreev’s function cannot be computed on a (1/2+ε)-fraction of n -bit inputs by depth-two circuits of o (ε 3 n 3/2 /log 3 n ) gates, nor can it be computed with o (ε 3 n 5/2 /log 7/2 n ) wires. This establishes an average-case “size hierarchy” for threshold circuits, as Andreev’s function is computable by uniform depth-two circuits of o ( n 3 ) linear threshold gates, and by uniform depth-three circuits of O ( n ) majority gates. (2) We present a new function in P based on small-biased sets, which we prove cannot be computed by a majority vote of depth-two threshold circuits of o ( n 3/2 /log 3 n ) gates, nor with o ( n 5/2 /log 7/2 n ) wires. (3) We give tight average-case (gate and wire) complexity results for computing PARITY with depth-two threshold circuits; the answer turns out to be the same as for depth-two majority circuits. The key is a new method for analyzing random restrictions to linear threshold functions. Our main analytical tool is the Littlewood-Offord Lemma from additive combinatorics.

FOCS Conference 2015 Conference Paper

Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Vladimir Nikishkin

We give a general unified method that can be used for L 1 closeness testing of a wide range of univariate structured distribution families. More specifically, we design a sample optimal and computationally efficient algorithm for testing the equivalence of two unknown (potentially arbitrary) univariate distributions under the Ak-distance metric: Given sample access to distributions with density functions p, q: I → R, we want to distinguish between the cases that p = q and ∥p - q∥ Ak ≥ ∈ with probability at least 2/3. We show that for any k ≥ 2, ∈ > 0, the optimal sample complexity of the Ak-closeness testing problem is Θ(max{k 4/5 /∈ 6/5, k 1/2 /∈ 2 }). This is the first o(k) sample algorithm for this problem, and yields new, simple L1 closeness testers, in most cases with optimal sample complexity, for broad classes of structured distributions.

FOCS Conference 2015 Conference Paper

Pseudorandomness via the Discrete Fourier Transform

  • Parikshit Gopalan
  • Daniel M. Kane
  • Raghu Meka

We present a new approach to constructing unconditional pseudorandom generators against classes of functions that involve computing a linear function of the inputs. We give an explicit construction of a pseudorandom generator that fools the discrete Fourier transforms of linear functions with seed-length that is nearly logarithmic (up to polyloglog factors) in the input size and the desired error parameter. Our result gives a single pseudorandom generator that fools several important classes of tests computable in log space that have been considered in the literature, including half spaces (over general domains), modular tests and combinatorial shapes. For all these classes, our generator is the first that achieves near logarithmic seed-length in both the input length and the error parameter. Getting such a seed-length is a natural challenge in its own right, which needs to be overcome in order to derandomize RL -- a central question in complexity theory. Our construction combines ideas from a large body of prior work, ranging from a classical construction of [1] to the recent gradually increasing independence paradigm of [2] -- [4], while also introducing some novel analytic machinery which might find other applications.

SODA Conference 2015 Conference Paper

Testing Identity of Structured Distributions

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Vladimir Nikishkin

We study the question of identity testing for structured distributions. More precisely, given samples from a structured distribution q over [ n ] and an explicit distribution p over [ n ], we wish to distinguish whether q = p versus q is at least ε -far from p, in L 1 distance. In this work, we present a unified approach that yields new, simple testers, with sample complexity that is information-theoretically optimal, for broad classes of structured distributions, including t -flat distributions, t -modal distributions, log-concave distributions, monotone hazard rate (MHR) distributions, and mixtures thereof.

STOC Conference 2013 Conference Paper

A PRG for lipschitz functions of polynomials with applications to sparsest cut

  • Daniel M. Kane
  • Raghu Meka

We give improved pseudorandom generators (PRGs) for Lipschitz functions of low-degree polynomials over the hypercube. These are functions of the form ψ(P(x)), where P:{1,-1} n -> R is a low-degree polynomial and ψ:R -> R is a function with small Lipschitz constant. PRGs for smooth functions of low-degree polynomials have received a lot of attention recently and play an important role in constructing PRGs for the natural class of polynomial threshold functions [12,13,24,16,15]. In spite of the recent progress, no nontrivial PRGs were known for fooling Lipschitz functions of degree O(log n) polynomials even for constant error rate. In this work, we give the first such generator obtaining a seed-length of (log n)~O(l 2 /ε 2 ) for fooling degree l polynomials with error ε. Previous generators had an exponential dependence on the degree l. We use our PRG to get better integrality gap instances for sparsest cut, a fundamental problem in graph theory with many applications in graph optimization. We give an instance of uniform sparsest cut for which a powerful semi-definite relaxation (SDP) first introduced by Goemans and Linial and studied in the seminal work of Arora, Rao and Vazirani [3] has an integrality gap of exp(Ω((log log n) 1/2 )). Understanding the performance of the Goemans-Linial SDP for uniform sparsest cut is an important open problem in approximation algorithms and metric embeddings. Our work gives a near-exponential improvement over previous lower bounds which achieved a gap of Ω(log log n) [11,21]. Our gap instance builds on the recent short code gadgets of Barak et al. [5].

FOCS Conference 2012 Conference Paper

A Structure Theorem for Poorly Anticoncentrated Gaussian Chaoses and Applications to the Study of Polynomial Threshold Functions

  • Daniel M. Kane

We prove a structural result for degree-d polynomials. In particular, we show that any degree-d polynomial, p can be approximated by another polynomial, p 0, which can be decomposed as some function of polynomials q1, · · ·, q m with q i normalized and m=O d (1), so that if X is a Gaussian random variable, the probability distribution on (q 1 (X), · · ·, q m (X)) does not have too much mass in any small box. Using this result, we prove improved versions of a number of results about polynomial threshold functions, including producing better pseudorandom generators, obtaining a better invariance principle, and proving improved bounds on noise sensitivity.

FOCS Conference 2010 Conference Paper

Bounded Independence Fools Degree-2 Threshold Functions

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Jelani Nelson

For an n-variate degree-2 real polynomial p, we prove that E x~D [sig(p(x))] Is determined up to an additive ε as long as D is a k-wise Independent distribution over {-1, 1} n for k = poly(1/ε). This gives a broad class of explicit pseudorandom generators against degree-2 boolean threshold functions, and answers an open question of Diakonikolas et al. (FOCS 2009).

SODA Conference 2010 Conference Paper

On the Exact Space Complexity of Sketching and Streaming Small Norms

  • Daniel M. Kane
  • Jelani Nelson
  • David P. Woodruff

We settle the 1-pass space complexity of (1 ± ε)-approximating the L p norm, for real p with 1 ≤ p ≤ 2, of a length- n vector updated in a length- m stream with updates to its coordinates. We assume the updates are integers in the range [ –M, M ]. In particular, we show the space required is Θ(ε −2 log( mM ) + log log( n )) bits. Our result also holds for 0 < p < 1; although L p is not a norm in this case, it remains a well-defined function. Our upper bound improves upon previous algorithms of [Indyk, JACM ‘06] and [Li, SODA ‘08]. This improvement comes from showing an improved derandomization of the L p sketch of Indyk by using k -wise independence for small k, as opposed to using the heavy hammer of a generic pseudorandom generator against space-bounded computation such as Nisan's PRG. Our lower bound improves upon previous work of [Alon-Matias-Szegedy, JCSS ‘99] and [Woodruff, SODA ‘04], and is based on showing a direct sum property for the 1-way communication of the gap-Hamming problem.

SODA Conference 2009 Conference Paper

The geometry of binary search trees

  • Erik D. Demaine
  • Dion Harmon
  • John Iacono
  • Daniel M. Kane
  • Mihai Patrascu

We present a novel connection between binary search trees (BSTs) and points in the plane satisfying a simple property. Using this correspondence, we achieve the following results: 1. 1. A surprisingly clean restatement in geometric terms of many results and conjectures relating to BSTs and dynamic optimality. 2. 2. A new lower bound for searching in the BST model, which subsumes the previous two known bounds of Wilber [FOCS'86]. 3. 3. The first proposal for dynamic optimality not based on splay trees. A natural greedy but offline algorithm was presented by Lucas [1988], and independently by Munro [2000], and was conjectured to be an (additive) approximation of the best binary search tree. We show that there exists an equal-cost online algorithm, transforming the conjecture of Lucas and Munro into the conjecture that the greedy algorithm is dynamically optimal.

FOCS Conference 2005 Conference Paper

On the Complexity of Two-PlayerWin-Lose Games

  • Timothy G. Abbott
  • Daniel M. Kane
  • Paul Valiant

The efficient computation of Nash equilibria is one of the most formidable challenges in computational complexity today. The problem remains open for two-player games. We show that the complexity of two-player Nash equilibria is unchanged when all outcomes are restricted to be 0 or 1. That is, win-or-lose games are as complex as the general case for two-player games.