Arrow Research search

Author name cluster

Constantinos Daskalakis

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

76 papers
2 author rows

Possible papers

76

NeurIPS Conference 2025 Conference Paper

Ambient Diffusion Omni: Training Good Models with Bad Data

  • Giannis Daras
  • Adrian Rodriguez-Munoz
  • Adam Klivans
  • Antonio Torralba
  • Constantinos Daskalakis

We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model. Typically, diffusion models are trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. We show that there is immense value in the lower-quality images that are often discarded. We present Ambient Diffusion Omni, a simple, principled framework to train diffusion models that can extract signal from arbitrarily images during training. Our framework exploits two properties of natural images -- spectral power law decay and locality. We first validate our framework by successfully training diffusion models with images synthetically corrupted by Gaussian blur, JPEG compression, and motion blur. We use our framework to achieve state-of-the-art ImageNet FID and we show significant improvements in both image quality and diversity for text-to-image generative modeling. The core insight is that noise dampens the initial skew between the desired high-quality distribution and the mixed distribution we actually observe. We provide rigorous theoretical justification for our approach by analyzing the trade-off between learning from biased data versus limited unbiased data across diffusion times.

NeurIPS Conference 2025 Conference Paper

Ambient Proteins - Training Diffusion Models on Noisy Structures

  • Giannis Daras
  • Jeffrey Ouyang-Zhang
  • Krithika Ravishankar
  • Constantinos Daskalakis
  • Adam Klivans
  • Daniel Diaz

We present Ambient Protein Diffusion, a framework for training protein diffusion models that generates structures with unprecedented diversity and quality. State-of-the-art generative models are trained on computationally derived structures from AlphaFold2 (AF), as experimentally determined structures are relatively scarce. The resulting models are therefore limited by the quality of synthetic datasets. Since the accuracy of AF predictions degrades with increasing protein length and complexity, de novo generation of long, complex proteins remains challenging. Ambient Protein Diffusion overcomes this problem by treating low-confidence AF structures as corrupted data. Rather than simply filtering out low-quality AF structures, our method adjusts the diffusion objective for each structure based on its corruption level, allowing the model to learn from both high and low quality structures. Empirically, ambient protein diffusion yields major improvements: on proteins with 700 residues, diversity increases from 45% to 85% from the previous state-of-the-art, and designability improves from 70% to 88%.

ICLR Conference 2025 Conference Paper

How Much is a Noisy Image Worth? Data Scaling Laws for Ambient Diffusion

  • Giannis Daras
  • Yeshwanth Cherapanamjeri
  • Constantinos Daskalakis

The quality of generative models depends on the quality of the data they are trained on. Creating large-scale, high-quality datasets is often expensive and sometimes impossible, e.g.~in certain scientific applications where there is no access to clean data due to physical or instrumentation constraints. Ambient Diffusion and related frameworks train diffusion models with solely corrupted data (which are usually cheaper to acquire) but ambient models significantly underperform models trained on clean data. We study this phenomenon at scale by training more than $80$ models on data with different corruption levels across three datasets ranging from $30,000$ to $\approx 1.3$M samples. We show that it is impossible, at these sample sizes, to match the performance of models trained on clean data when only training on noisy data. Yet, a combination of a small set of clean data (e.g.~$10\%$ of the total dataset) and a large set of highly noisy data suffices to reach the performance of models trained solely on similar-size datasets of clean data, and in particular to achieve near state-of-the-art performance. We provide theoretical evidence for our findings by developing novel sample complexity bounds for learning from Gaussian Mixtures with heterogeneous variances. Our theoretical model suggests that, for large enough datasets, the effective marginal utility of a noisy sample is exponentially worse that of a clean sample. Providing a small set of clean samples can significantly reduce the sample size requirements for noisy data, as we also observe in our experiments.

ICML Conference 2025 Conference Paper

Learning Gaussian DAG Models without Condition Number Bounds

  • Constantinos Daskalakis
  • Anthimos Vardis Kandiros
  • Rui Yao

We study the problem of learning the topology of a directed Gaussian Graphical Model under the equal-variance assumption, where the graph has $n$ nodes and maximum in-degree $d$. Prior work has established that $O(d \log n)$ samples are sufficient for this task. However, an important factor that is often overlooked in these analyses is the dependence on the condition number of the covariance matrix of the model. Indeed, all algorithms from prior work require a number of samples that grows polynomially with this condition number. In many cases this is unsatisfactory, since the condition number could grow polynomially with $n$, rendering these prior approaches impractical in high-dimensional settings. In this work, we provide an algorithm that recovers the underlying graph and prove that the number of samples required is independent of the condition number. Furthermore, we establish lower bounds that nearly match the upper bound up to a $d$-factor, thus providing an almost tight characterization of the true sample complexity of the problem. Moreover, under a further assumption that all the variances of the variables are bounded, we design a polynomial-time algorithm that recovers the underlying graph, at the cost of an additional polynomial dependence of the sample complexity on $d$. We complement our theoretical findings with simulations on synthetic datasets that confirm our predictions.

ICML Conference 2024 Conference Paper

Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data

  • Giannis Daras
  • Alexandros G. Dimakis
  • Constantinos Daskalakis

Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data resort to approximations which deteriorate performance. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in Ambient diffusion. Our key technical contribution is a method that uses a double application of Tweedie’s formula and a consistency loss function that allows us to extend sampling at noise levels below the observed data noise. We also provide further evidence that diffusion models memorize from their training sets by identifying extremely corrupted images that are almost perfectly reconstructed, raising copyright and privacy concerns. Our method for training using corrupted samples can be used to mitigate this problem. We demonstrate this by fine-tuning Stable Diffusion XL to generate samples from a distribution using only noisy samples. Our framework reduces the amount of memorization of the fine-tuning dataset, while maintaining competitive performance.

NeurIPS Conference 2024 Conference Paper

Maximizing utility in multi-agent environments by anticipating the behavior of other learners

  • Angelos Assos
  • Yuval Dagan
  • Constantinos Daskalakis

Learning algorithms are often used to make decisions in sequential decision-making environments. In multi-agent settings, the decisions of each agent can affect the utilities/losses of the other agents. Therefore, if an agent is good at anticipating the behavior of the other agents, in particular how they will make decisions in each round as a function of their experience that far, it could try to judiciously make its own decisions over the rounds of the interaction so as to influence the other agents to behave in a way that ultimately benefits its own utility. In this paper, we study repeated two-player games involving two types of agents: a learner, which employs an online learning algorithm to choose its strategy in each round; and an optimizer, which knows the learner's utility function and the learner's online learning algorithm. The optimizer wants to plan ahead to maximize its own utility, while taking into account the learner's behavior. We provide two results: a positive result for repeated zero-sum games and a negative result for repeated general-sum games. Our positive result is an algorithm for the optimizer, which exactly maximizes its utility against a learner that plays the Replicator Dynamics --- the continuous-time analogue of Multiplicative Weights Update (MWU). Additionally, we use this result to provide an algorithm for the optimizer against MWU, i. e. ~for the discrete-time setting, which guarantees an average utility for the optimizer that is higher than the value of the one-shot game. Our negative result shows that, unless P=NP, there is no Fully Polynomial Time Approximation Scheme (FPTAS) for maximizing the utility of an optimizer against a learner that best-responds to the history in each round. Yet, this still leaves open the question of whether there exists a polynomial-time algorithm that optimizes the utility up to $o(T)$.

NeurIPS Conference 2024 Conference Paper

On Tractable $\Phi$-Equilibria in Non-Concave Games

  • Yang Cai
  • Constantinos Daskalakis
  • Haipeng Luo
  • Chen-Yu Wei
  • Weiqiang Zheng

While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when utilities are non-concave -- a common scenario in machine learning applications involving strategies parameterized by deep neural networks, or when agents' utilities are computed by neural networks, or both. Non-concave games introduce significant game-theoretic and optimization challenges: (i) Nash equilibria may not exist; (ii) local Nash equilibria, though they exist, are intractable; and (iii) mixed Nash, correlated, and coarse correlated equilibria generally have infinite support and are intractable. To sidestep these challenges, we revisit the classical solution concept of $\Phi$-equilibria introduced by Greenwald and Jafari [GJ03], which is guaranteed to exist for an arbitrary set of strategy modifications $\Phi$ even in non-concave games [SL07]. However, the tractability of $\Phi$-equilibria in such games remains elusive. In this paper, we initiate the study of tractable $\Phi$-equilibria in non-concave games and examine several natural families of strategy modifications. We show that when $\Phi$ is finite, there exists an efficient uncoupled learning algorithm that approximates the corresponding $\Phi$-equilibria. Additionally, we explore cases where $\Phi$ is infinite but consists of local modifications, showing that Online Gradient Descent can efficiently approximate $\Phi$-equilibria in non-trivial regimes.

NeurIPS Conference 2023 Conference Paper

Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be Consistent

  • Giannis Daras
  • Yuval Dagan
  • Alex Dimakis
  • Constantinos Daskalakis

Imperfect score-matching leads to a shift between the training and the sampling distribution of diffusion models. Due to the recursive nature of the generation process, errors in previous steps yield sampling iterates that drift away from the training distribution. However, the standard training objective via Denoising Score Matching (DSM) is only designed to optimize over non-drifted data. To train on drifted data, we propose to enforce a \emph{Consistency} property (CP) which states that predictions of the model on its owngenerated data are consistent across time. Theoretically, we show that the differential equation that describes CP together with the one that describes a conservative vector field, have a unique solution given some initial condition. Consequently, if the score is learned well on non-drifted points via DSM (enforcing the true initial condition) then enforcing CP on drifted points propagates true score values. Empirically, we show that enforcing CP improves the generation quality for conditional and unconditional generation on CIFAR-10, and in AFHQ and FFHQ. We open-source our code and models: https: //github. com/giannisdaras/cdm.

AAAI Conference 2022 Conference Paper

How Good Are Low-Rank Approximations in Gaussian Process Regression?

  • Constantinos Daskalakis
  • Petros Dellaportas
  • Aristeidis Panos

We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel’s Mercer expansion. In particular, we bound the Kullback–Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and we also bound the error between predictive mean vectors and between predictive covariance matrices computed using the exact versus using the approximate GP. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.

ICML Conference 2022 Conference Paper

Score-Guided Intermediate Level Optimization: Fast Langevin Mixing for Inverse Problems

  • Giannis Daras
  • Yuval Dagan
  • Alexandros G. Dimakis
  • Constantinos Daskalakis

We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators. This result extends the work of Hand and Voroninski from efficient inversion to efficient posterior sampling. In practice, to allow for increased expressivity, we propose to do posterior sampling in the latent space of a pre-trained generative model. To achieve that, we train a score-based model in the latent space of a StyleGAN-2 and we use it to solve inverse problems. Our framework, Score-Guided Intermediate Layer Optimization (SGILO), extends prior work by replacing the sparsity regularization with a generative prior in the intermediate layer. Experimentally, we obtain significant improvements over the previous state-of-the-art, especially in the low measurement regime.

NeurIPS Conference 2021 Conference Paper

Efficient Truncated Linear Regression with Unknown Noise Variance

  • Constantinos Daskalakis
  • Patroklos Stefanou
  • Rui Yao
  • Emmanouil Zampetakis

Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early works of [Tobin'58, Amemiya '73]. When the distribution of the error is normal with known variance, recent work of [Daskalakis et al. '19] provides computationally and statistically efficient estimators of the linear model, $w$. In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efficient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit confidence regions for our estimates.

STOC Conference 2021 Conference Paper

Learning Ising models from one or multiple samples

  • Yuval Dagan
  • Constantinos Daskalakis
  • Nishanth Dikkala
  • Anthimos Vardis Kandiros

There have been two main lines of work on estimating Ising models: (1) estimating them from multiple independent samples under minimal assumptions about the model's interaction matrix ; and (2) estimating them from one sample in restrictive settings. We propose a unified framework that smoothly interpolates between these two settings, enabling significantly richer estimation guarantees from one, a few, or many samples. Our main theorem provides guarantees for one-sample estimation, quantifying the estimation error in terms of the metric entropy of a family of interaction matrices. As corollaries of our main theorem, we derive bounds when the model's interaction matrix is a (sparse) linear combination of known matrices, or it belongs to a finite set, or to a high-dimensional manifold. In fact, our main result handles multiple independent samples by viewing them as one sample from a larger model, and can be used to derive estimation bounds that are qualitatively similar to those obtained in the afore-described multiple-sample literature. Our technical approach benefits from sparsifying a model's interaction network, conditioning on subsets of variables that make the dependencies in the resulting conditional distribution sufficiently weak. We use this sparsification technique to prove strong concentration and anti-concentration results for the Ising model, which we believe have applications beyond the scope of this paper.

NeurIPS Conference 2021 Conference Paper

Near-Optimal No-Regret Learning in General Games

  • Constantinos Daskalakis
  • Maxwell Fishelson
  • Noah Golowich

We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her action in response to the history of play so far, then after $T$ rounds of interaction, each player experiences total regret that is ${\rm poly}(\log T)$. Our bound improves, exponentially, the $O(T^{1/2})$ regret attainable by standard no-regret learners in games, the $O(T^{1/4})$ regret attainable by no-regret learners with recency bias (Syrgkanis et al. , NeurIPS 2015), and the $O(T^{1/6})$ bound that was recently shown for Optimistic Hedge in the special case of two-player games (Chen & Peng, NeurIPS 2020). A direct corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $\tilde{O}(1/T)$.

STOC Conference 2021 Conference Paper

Sample-optimal and efficient learning of tree Ising models

  • Constantinos Daskalakis
  • Qinxuan Pan

We show that n -variable tree-structured Ising models can be learned computationally-efficiently to within total variation distance є from an optimal O ( n ln n /є 2 ) samples, where O (·) hides an absolute constant which, importantly, does not depend on the model being learned—neither its tree nor the magnitude of its edge strengths, on which we place no assumptions. Our guarantees hold, in fact, for the celebrated Chow-Liu algorithm [1968], using the plug-in estimator for estimating mutual information. While this (or any other) algorithm may fail to identify the structure of the underlying model correctly from a finite sample, we show that it will still learn a tree-structured model that is є-close to the true one in total variation distance, a guarantee called “proper learning.” Our guarantees do not follow from known results for the Chow-Liu algorithm and the ensuing literature on learning graphical models, including the very recent renaissance of algorithms on this learning challenge, which only yield asymptotic consistency results, or sample-suboptimal and/or time-inefficient algorithms, unless further assumptions are placed on the model, such as bounds on the “strengths” of the model’s edges. While we establish guarantees for a widely known and simple algorithm, the analysis that this algorithm succeeds and is sample-optimal is quite complex, requiring a hierarchical classification of the edges into layers with different reconstruction guarantees, depending on their strength, combined with delicate uses of the subadditivity of the squared Hellinger distance over graphical models to control the error accumulation.

AAAI Conference 2021 Conference Paper

Scalable Equilibrium Computation in Multi-agent Influence Games on Networks

  • Fotini Christia
  • Michael Curry
  • Constantinos Daskalakis
  • Erik Demaine
  • John P. Dickerson
  • MohammadTaghi Hajiaghayi
  • Adam Hesterberg
  • Marina Knittel

We provide a polynomial-time, scalable algorithm for equilibrium computation in multi-agent influence games on networks, extending work of Bindel, Kleinberg, and Oren (2015) from the single-agent to the multi-agent setting. In games of influence, agents have limited advertising budget to influence the initial predisposition of nodes in some network towards their products, but the eventual decisions of the nodes are determined by the stationary state of DeGroot opinion dynamics on the network, which takes over after the seeding (Ahmadinejad et al. 2014, 2015). In multi-agent systems, how should agents spend their budgets to seed the network to maximize their utility in anticipation of other advertising agents and the network dynamics? We show that Nash equilibria of this game are pure and (under weak assumptions) unique, and can be computed in polynomial time; we test our model by computing equilibria using mirror descent for the two-agent case on random graphs.

ICML Conference 2021 Conference Paper

Statistical Estimation from Dependent Data

  • Anthimos Vardis Kandiros
  • Yuval Dagan
  • Nishanth Dikkala
  • Surbhi Goel
  • Constantinos Daskalakis

We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioning on their feature vectors, but dependent, capturing settings where e. g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i. e. do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network regression settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i. e. external fields and interaction strengths) of Ising models from a single sample.

STOC Conference 2021 Conference Paper

The complexity of constrained min-max optimization

  • Constantinos Daskalakis
  • Stratis Skoulakis
  • Manolis Zampetakis

Despite its important applications in Machine Learning, min-max optimization of objective functions that are nonconvex-nonconcave remains elusive. Not only are there no known first-order methods converging to even approximate local min-max equilibria (a.k.a. approximate saddle points), but the computational complexity of identifying them is also poorly understood. In this paper, we provide a characterization of the computational complexity as well as of the limitations of first-order methods in this problem. Specifically, we show that in linearly constrained min-max optimization problems with nonconvex-nonconcave objectives an approximate local min-max equilibrium of large enough approximation is guaranteed to exist, but computing such a point is PPAD-complete. The same is true of computing an approximate fixed point of the (Projected) Gradient Descent/Ascent update dynamics, which is computationally equivalent to computing approximate local min-max equilibria. An important byproduct of our proof is to establish an unconditional hardness result in the Nemirovsky-Yudin 1983 oracle optimization model, where we are given oracle access to the values of some function f : P → [−1, 1] and its gradient ∇ f , where P ⊆ [0, 1] d is a known convex polytope. We show that any algorithm that uses such first-order oracle access to f and finds an ε-approximate local min-max equilibrium needs to make a number of oracle queries that is exponential in at least one of 1/ε, L , G , or d , where L and G are respectively the smoothness and Lipschitzness of f . This comes in sharp contrast to minimization problems, where finding approximate local minima in the same setting can be done with Projected Gradient Descent using O ( L /ε) many queries. Our result is the first to show an exponential separation between these two fundamental optimization problems in the oracle model.

NeurIPS Conference 2020 Conference Paper

Constant-Expansion Suffices for Compressed Sensing with Generative Priors

  • Constantinos Daskalakis
  • Dhruv Rohatgi
  • Emmanouil Zampetakis

Generative neural networks have been empirically found very promising in providing effective structural priors for compressed sensing, since they can be trained to span low-dimensional data manifolds in high-dimensional signal spaces. Despite the non-convexity of the resulting optimization problem, it has also been shown theoretically that, for neural networks with random Gaussian weights, a signal in the range of the network can be efficiently, approximately recovered from a few noisy measurements. However, a major bottleneck of these theoretical guarantees is a network \emph{expansivity} condition: that each layer of the neural network must be larger than the previous by a logarithmic factor. Our main contribution is to break this strong expansivity assumption, showing that \emph{constant} expansivity suffices to get efficient recovery algorithms, besides it also being information-theoretically necessary. To overcome the theoretical bottleneck in existing approaches we prove a novel uniform concentration theorem for random functions that might not be Lipschitz but satisfy a relaxed notion which we call ``pseudo-Lipschitzness. '' Using this theorem we can show that a matrix concentration inequality known as the \emph{Weight Distribution Condition (WDC)}, which was previously only known to hold for Gaussian matrices with logarithmic aspect ratio, in fact holds for constant aspect ratios too. Since WDC is a fundamental matrix concentration inequality in the heart of all existing theoretical guarantees on this problem, our tighter bound immediately yields improvements in all known results in the literature on compressed sensing with deep generative priors, including one-bit recovery, phase retrieval, and more.

NeurIPS Conference 2020 Conference Paper

Independent Policy Gradient Methods for Competitive Reinforcement Learning

  • Constantinos Daskalakis
  • Dylan J. Foster
  • Noah Golowich

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i. e. , zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.

ICML Conference 2020 Conference Paper

SGD Learns One-Layer Networks in WGANs

  • Qi Lei
  • Jason D. Lee
  • Alexandros G. Dimakis
  • Constantinos Daskalakis

Generative adversarial networks (GANs) are a widely used framework for learning generative models. Wasserstein GANs (WGANs), one of the most successful variants of GANs, require solving a minmax optimization problem to global optimality, but are in practice successfully trained using stochastic gradient descent-ascent. In this paper, we show that, when the generator is a one-layer network, stochastic gradient descent-ascent converges to a global solution with polynomial time and sample complexity.

NeurIPS Conference 2020 Conference Paper

Tight last-iterate convergence rates for no-regret learning in multi-player games

  • Noah Golowich
  • Sarath Pattathil
  • Constantinos Daskalakis

We study the question of obtaining last-iterate convergence rates for no-regret learning algorithms in multi-player games. We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of O(1/√T) with respect to the gap function in smooth monotone games. This result addresses a question of Mertikopoulos & Zhou (2018), who asked whether extra-gradient approaches (such as OG) can be applied to achieve improved guarantees in the multi-agent learning setting. The proof of our upper bound uses a new technique centered around an adaptive choice of potential function at each iteration. We also show that the O(1/√T) rate is tight for all p-SCLI algorithms, which includes OG as a special case. As a byproduct of our lower bound analysis we additionally present a proof of a conjecture of Arjevani et al. (2015) which is more direct than previous approaches.

NeurIPS Conference 2020 Conference Paper

Truncated Linear Regression in High Dimensions

  • Constantinos Daskalakis
  • Dhruv Rohatgi
  • Emmanouil Zampetakis

As in standard linear regression, in truncated linear regression, we are given access to observations (A i, y i) i whose dependent variable equals y i= A i^{\rm T} \cdot x^* + \eta i, where x^* is some fixed unknown vector of interest and \eta i is independent noise; except we are only given an observation if its dependent variable y i lies in some "truncation set" S \subset \mathbb{R}. The goal is to recover x^* under some favorable conditions on the A i's and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering k-sparse n-dimensional vectors x^* from m truncated samples, which attains an optimal \ell 2 reconstruction error of O(\sqrt{(k \log n)/m}). As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, such as that which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaption of the LASSO optimization problem that accommodates truncation. This generalizes the works of both: (1) [Daskalakis et al. 2018], where no regularization is needed due to the low dimensionality of the data, and (2) [Wainright 2009], where the objective function is simple due to the absence of truncation. In order to deal with both truncation and high-dimensionality at the same time, we develop new techniques that not only generalize the existing ones but we believe are of independent interest.

STOC Conference 2018 Conference Paper

A converse to Banach's fixed point theorem and its CLS-completeness

  • Constantinos Daskalakis
  • Christos Tzamos
  • Manolis Zampetakis

Banach's fixed point theorem for contraction maps has been widely used to analyze the convergence of iterative methods in non-convex problems. It is a common experience, however, that iterative maps fail to be globally contracting under the natural metric in their domain, making the applicability of Banach's theorem limited. We explore how generally we can apply Banach's fixed point theorem to establish the convergence of iterative methods when pairing it with carefully designed metrics. Our first result is a strong converse of Banach's theorem, showing that it is a universal analysis tool for establishing global convergence of iterative methods to unique fixed points, and for bounding their convergence rate. In other words, we show that, whenever an iterative map globally converges to a unique fixed point, there exists a metric under which the iterative map is contracting and which can be used to bound the number of iterations until convergence. We illustrate our approach in the widely used power method, providing a new way of bounding its convergence rate through contraction arguments. We next consider the computational complexity of Banach's fixed point theorem. Making the proof of our converse theorem constructive, we show that computing a fixed point whose existence is guaranteed by Banach's fixed point theorem is CLS-complete. We thus provide the first natural complete problem for the class CLS, which was defined in [DP11] to capture the complexity of problems such as P-matrix LCP, computing KKT-points, and finding mixed Nash equilibria in congestion and network coordination games.

FOCS Conference 2018 Conference Paper

Efficient Statistics, in High Dimensions, from Truncated Samples

  • Constantinos Daskalakis
  • Themis Gouleakis
  • Christos Tzamos
  • Manolis Zampetakis

We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal N(mu, Sigma) means a samples is only revealed if it falls in some subset S of the d-dimensional Euclidean space; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean mu and covariance matrix Sigma can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible.

NeurIPS Conference 2018 Conference Paper

HOGWILD!-Gibbs can be PanAccurate

  • Constantinos Daskalakis
  • Nishanth Dikkala
  • Siddhartha Jayanti

Asynchronous Gibbs sampling has been recently shown to be fast-mixing and an accurate method for estimating probabilities of events on a small number of variables of a graphical model satisfying Dobrushin's condition~\cite{DeSaOR16}. We investigate whether it can be used to accurately estimate expectations of functions of {\em all the variables} of the model. Under the same condition, we show that the synchronous (sequential) and asynchronous Gibbs samplers can be coupled so that the expected Hamming distance between their (multivariate) samples remains bounded by $O(\tau \log n), $ where $n$ is the number of variables in the graphical model, and $\tau$ is a measure of the asynchronicity. A similar bound holds for any constant power of the Hamming distance. Hence, the expectation of any function that is Lipschitz with respect to a power of the Hamming distance, can be estimated with a bias that grows logarithmically in $n$. Going beyond Lipschitz functions, we consider the bias arising from asynchronicity in estimating the expectation of polynomial functions of all variables in the model. Using recent concentration of measure results~\cite{DaskalakisDK17, GheissariLP17, GotzeSS18}, we show that the bias introduced by the asynchronicity is of smaller order than the standard deviation of the function value already present in the true model. We perform experiments on a multi-processor machine to empirically illustrate our theoretical findings.

NeurIPS Conference 2018 Conference Paper

Learning and Testing Causal Models with Interventions

  • Jayadev Acharya
  • Arnab Bhattacharyya
  • Constantinos Daskalakis
  • Saravanan Kandasamy

We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network M on a graph with n discrete variables and bounded in-degree and bounded ``confounded components'', we show that O(log n) interventions on an unknown causal Bayesian network X on the same graph, and O(n/epsilon^2) samples per intervention, suffice to efficiently distinguish whether X=M or whether there exists some intervention under which X and M are farther than epsilon in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: Omega(log n) interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks.

NeurIPS Conference 2018 Conference Paper

Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons

  • Nima Anari
  • Constantinos Daskalakis
  • Wolfgang Maass
  • Christos Papadimitriou
  • Amin Saberi
  • Santosh Vempala

We analyze linear independence of rank one tensors produced by tensor powers of randomly perturbed vectors. This enables efficient decomposition of sums of high-order tensors. Our analysis builds upon [BCMV14] but allows for a wider range of perturbation models, including discrete ones. We give an application to recovering assemblies of neurons. Assemblies are large sets of neurons representing specific memories or concepts. The size of the intersection of two assemblies has been shown in experiments to represent the extent to which these memories co-occur or these concepts are related; the phenomenon is called association of assemblies. This suggests that an animal's memory is a complex web of associations, and poses the problem of recovering this representation from cognitive data. Motivated by this problem, we study the following more general question: Can we reconstruct the Venn diagram of a family of sets, given the sizes of their l-wise intersections? We show that as long as the family of sets is randomly perturbed, it is enough for the number of measurements to be polynomially larger than the number of nonempty regions of the Venn diagram to fully reconstruct the diagram.

NeurIPS Conference 2018 Conference Paper

The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization

  • Constantinos Daskalakis
  • Ioannis Panageas

Motivated by applications in Optimization, Game Theory, and the training of Generative Adversarial Networks, the convergence properties of first order methods in min-max problems have received extensive study. It has been recognized that they may cycle, and there is no good understanding of their limit points when they do not. When they converge, do they converge to local min-max solutions? We characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradient Descent Ascent (OGDA). We show that both dynamics avoid unstable critical points for almost all initializations. Moreover, for small step sizes and under mild assumptions, the set of OGDA-stable critical points is a superset of GDA-stable critical points, which is a superset of local min-max solutions (strict in some cases). The connecting thread is that the behavior of these dynamics can be studied from a dynamical systems perspective.

SODA Conference 2018 Conference Paper

Which Distribution Distances are Sublinearly Testable?

  • Constantinos Daskalakis
  • Gautam Kamath 0001
  • John Wright 0004

Given samples from an unknown distribution p and a description of a distribution q, are p and q close or far? This question of “identity testing” has received significant attention in the case of testing whether p and q are equal or far in total variation distance. However, in recent work [VV11a, ADK15, DP17], the following questions have been been critical to solving problems at the frontiers of distribution testing: Alternative Distances: Can we test whether p and q are far in other distances, say Hellinger? Tolerance: Can we test when p and q are close, rather than equal? And if so, close in which distances? Motivated by these questions, we characterize the complexity of distribution testing under a variety of distances, including total variation, ℓ 2, Hellinger, Kullback-Leibler, and χ 2. For each pair of distances d 1 and d 2, we study the complexity of testing if p and q are close in d 1 versus far in d 2, with a focus on identifying which problems allow strongly sublinear testers (i. e. , those with complexity O ( n 1– γ ) for some γ > 0 where n is the size of the support of the distributions p and q ). We provide matching upper and lower bounds for each case. We also study these questions in the case where we only have samples from q (equivalence testing), showing qualitative differences from identity testing in terms of when tolerance can be achieved. Our algorithms fall into the classical paradigm of χ 2 -statistics, but require crucial changes to handle the challenges introduced by each distance we consider. Finally, we survey other recent results in an attempt to serve as a reference for the complexity of various distribution testing problems.

NeurIPS Conference 2017 Conference Paper

Concentration of Multilinear Functions of the Ising Model with Applications to Network Data

  • Constantinos Daskalakis
  • Nishanth Dikkala
  • Gautam Kamath

We prove near-tight concentration of measure for polynomial functions of the Ising model, under high temperature, improving the radius of concentration guaranteed by known results by polynomial factors in the dimension (i. e. ~the number of nodes in the Ising model). We show that our results are optimal up to logarithmic factors in the dimension. We obtain our results by extending and strengthening the exchangeable-pairs approach used to prove concentration of measure in this setting by Chatterjee. We demonstrate the efficacy of such functions as statistics for testing the strength of interactions in social networks in both synthetic and real world data.

FOCS Conference 2017 Conference Paper

Learning Multi-Item Auctions with (or without) Samples

  • Yang Cai 0001
  • Constantinos Daskalakis

We provide algorithms that learn simple auctions whose revenue is approximately optimal in multi-item multi-bidder settings, for a wide range of bidder valuations including unit-demand, additive, constrained additive, XOS, and subadditive. We obtain our learning results in two settings. The first is the commonly studied setting where sample access to the bidders' distributions over valuations is given, for both regular distributions and arbitrary distributions with bounded support. Here, our algorithms require polynomially many samples in the number of items and bidders. The second is a more general max-min learning setting that we introduce, where we are given “approximate distributions, ” and we seek to compute a mechanism whose revenue is approximately optimal simultaneously for all “true distributions” that are close to the ones we were given. These results are more general in that they imply the sample-based results, and are also applicable in settings where we have no sample access to the underlying distributions but have estimated them indirectly via market research or by observation of bidder behavior in previously run, potentially non-truthful auctions. All our results hold for valuation distributions satisfying the standard (and necessary) independence-across-items property. They also generalize and improve upon recent works of Goldner and Karlin [25] and Morgenstern and Roughgarden [32], which have provided algorithms that learn approximately optimal multi-item mechanisms in more restricted settings with additive, subadditive and unit-demand valuations using sample access to distributions. We generalize these results to the complete unit-demand, additive, and XOS setting, to i. i. d. subadditive bidders, and to the max-min setting. Our results are enabled by new uniform convergence bounds for hypotheses classes under product measures. Our bounds result in exponential savings in sample complexity compared to bounds derived by bounding the VC dimension and are of independent interest.

ICML Conference 2017 Conference Paper

Priv'IT: Private and Sample Efficient Identity Testing

  • Bryan Cai
  • Constantinos Daskalakis
  • Gautam Kamath 0001

We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\mathcal{D}$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\epsilon$, accuracy parameter $\alpha$, and requirements $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_\mathrm{tv}(p, q) \ge \alpha$. We provide theoretical bounds for the sample size $|\mathcal{D}|$ so that our method both satisfies $(\epsilon, 0)$-differential privacy, and guarantees $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.

FOCS Conference 2016 Conference Paper

Learning in Auctions: Regret is Hard, Envy is Easy

  • Constantinos Daskalakis
  • Vasilis Syrgkanis

An extensive body of recent work studies the welfare guarantees of simple and prevalent combinatorial auction formats, such as selling m items via simultaneous second price auctions (SiSPAs) [1], [2], [3]. These guarantees hold even when the auctions are repeatedly executed and the players use no-regret learning algorithms to choose their actions. Unfortunately, off-the-shelf no-regret learning algorithms for these auctions are computationally inefficient as the number of actions available to the players becomes exponential. We show that this obstacle is inevitable: there are no polynomial-time no-regret learning algorithms for SiSPAs, unless RP ⊇ NP, even when the bidders are unit-demand. Our lower bound raises the question of how good outcomes polynomially-bounded bidders may discover in such auctions. To answer this question, we propose a novel concept of learning in auctions, termed "no-envy learning. " This notion is founded upon Walrasian equilibrium, and we show that it is both efficiently implementable and results in approximately optimal welfare, even when the bidders have valuations from the broad class of fractionally subadditive (XOS) valuations (assuming demand oracle access to the valuations) or coverage valuations (even without demand oracles). No-envy learning outcomes are a relaxation of no-regret learning outcomes, which maintain their approximate welfare optimality while endowing them with computational tractability. Our positive and negative results extend to several auction formats that have been studied in the literature via the smoothness paradigm. Our positive results for XOS valuations are enabled by a novel Follow-The-Perturbed-Leader algorithm for settings where the number of experts and states of nature are both infinite, and the payoff function of the learner is non-linear. We show that this algorithm has applications outside of auction settings, establishing significant gains in a recent application of no-regret learning in security games. Our efficient learning result for coverage valuations is based on a novel use of convex rounding schemes and a reduction to online convex optimization.

SODA Conference 2015 Conference Paper

Bayesian Truthful Mechanisms for Job Scheduling from Bi-criterion Approximation Algorithms

  • Constantinos Daskalakis
  • S. Matthew Weinberg

We provide polynomial-time approximately optimal Bayesian mechanisms for makespan minimization on unrelated machines as well as for max-min fair allocations of indivisible goods, with approximation factors of 2 and respectively, matching the approximation ratios of best known polynomial-time algorithms (for max-min fairness, the latter claim is true for certain ratios of the number of goods m to people k ). Our mechanisms are obtained by establishing a polynomial-time approximation-sensitive reduction from the problem of designing approximately optimal mechanisms for some arbitrary objective to that of designing bi-criterion approximation algorithms for the same objective plus a linear allocation cost term. Our reduction is itself enabled by extending the celebrated “equivalence of separation and optimization” [27, 32] to also accommodate bi-criterion approximations. Moreover, to apply the reduction to the specific problems of makespan and max-min fairness we develop polynomial-time bi-criterion approximation algorithms for makespan minimization with costs and max-min fairness with costs, adapting the algorithms of [45], [10] and [4] to the type of bi-criterion approximation that is required by the reduction.

FOCS Conference 2015 Conference Paper

On the Structure, Covering, and Learning of Poisson Multinomial Distributions

  • Constantinos Daskalakis
  • Gautam Kamath 0001
  • Christos Tzamos

An (n, k)-Poisson Multinomial Distribution (PMD) is the distribution of the sum of n independent random vectors supported on the set Bk={e1, , ek} of standard basis vectors in Rk. We prove a structural characterization of these distributions, showing that, for all ε > 0, any (n, k)-Poisson multinomial random vector is ε-close, in total variation distance, to the sum of a discretized multidimensional Gaussian and an independent (poly(k/ε), k)-Poisson multinomial random vector. Our structural characterization extends the multi-dimensional CLT of Valiant and Valiant, by simultaneously applying to all approximation requirements ε. In particular, it overcomes factors depending on log n and, importantly, the minimum Eigen value of the PMD's covariance matrix. We use our structural characterization to obtain an ε-cover, in total variation distance, of the set of all (n, k)-PMDs, significantly improving the cover size of Daskalakis and Papadimitriou, and obtaining the same qualitative dependence of the cover size on n and ε as the k=2 cover of Daskalakis and Papadimitriou. We further exploit this structure to show that (n, k)-PMDs can be learned to within ε in total variation distance from Õk(1/ε2) samples, which is near-optimal in terms of dependence on ε and independent of n. In particular, our result generalizes the single-dimensional result of Daskalakis, Diakonikolas and Servedio for Poisson binomials to arbitrary dimension. Finally, as a corollary of our results on PMDs, we give a Õk(1/ε2) sample algorithm for learning (n, k)-sums of independent integer random variables (SIIRVs), which is near-optimal for constant k.

NeurIPS Conference 2015 Conference Paper

Optimal Testing for Properties of Distributions

  • Jayadev Acharya
  • Constantinos Daskalakis
  • Gautam Kamath

Given samples from an unknown distribution, p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C? This fundamental question has receivedtremendous attention in Statistics, albeit focusing onasymptotic analysis, as well as in Computer Science, wherethe emphasis has been on small sample size and computationalcomplexity. Nevertheless, even for basic classes ofdistributions such as monotone, log-concave, unimodal, and monotone hazard rate, the optimal sample complexity is unknown. We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem: Given samplesfrom an unknown distribution p, and a known distribution q, are p and q close in Chi^2-distance, or far in total variation distance? The optimality of all testers is established by providing matching lower bounds. Finally, a necessary building block for our tester and important byproduct of our work are the first known computationally efficient proper learners for discretelog-concave and monotone hazard rate distributions. We exhibit the efficacy of our testers via experimental analysis.

FOCS Conference 2014 Conference Paper

A Counter-example to Karlin's Strong Conjecture for Fictitious Play

  • Constantinos Daskalakis
  • Qinxuan Pan

Fictitious play is a natural dynamic for equilibrium play in zero-sum games, proposed by Brown [6], and shown to converge by Robinson [33]. Samuel Karlin conjectured in 1959 that fictitious play converges at rate O(t -1/2 ) with respect to the number of steps t. We disprove this conjecture by showing that, when the payoff matrix of the row player is the n × n identity matrix, fictitious play may converge (for some tie-breaking) at rate as slow as Ω(t -1/n ).

SODA Conference 2014 Conference Paper

A Polynomial-time Approximation Scheme for Fault-tolerant Distributed Storage

  • Constantinos Daskalakis
  • Anindya De
  • Ilias Diakonikolas
  • Ankur Moitra
  • Rocco A. Servedio

We consider a problem which has received considerable attention in systems literature because of its applications to routing in delay tolerant networks and replica placement in distributed storage systems. In abstract terms the problem can be stated as follows: Given a random variable X generated by a known product distribution over {0, 1} n and a target value 0 ≤ θ ≤ 1, output a non-negative vector w, with ‖ w ‖ 1 ≤ 1, which maximizes the probability of the event w · X ≥ θ. This is a challenging non-convex optimization problem for which even computing the value Pr[ w · X ≥ θ ] of a proposed solution vector w is #P-hard. We provide an additive EPTAS for this problem which, for constant-bounded product distributions, runs in poly( n ) · 2 poly(1/∊) time and outputs an ∊-approximately optimal solution vector w for this problem. Our approach is inspired by, and extends, recent structural results from the complexity-theoretic study of linear threshold functions. Furthermore, in spite of the objective function being non-smooth, we give a unicriterion PTAS while previous work for such objective functions has typically led to a bicriterion PTAS. We believe our techniques may be applicable to get unicriterion PTAS for other non-smooth objective functions.

SODA Conference 2014 Conference Paper

The Complexity of Optimal Mechanism Design

  • Constantinos Daskalakis
  • Alan Deckelbaum
  • Christos Tzamos

Myerson's seminal work provides a computationally efficient revenue-optimal auction for selling one item to multiple bidders [18]. Generalizing this work to selling multiple items at once has been a central question in economics and algorithmic game theory, but its complexity has remained poorly understood. We answer this question by showing that a revenue-optimal auction in multi-item settings cannot be found and implemented computationally efficiently, unless zpp ⊇ P #P. This is true even for a single additive bidder whose values for the items are independently distributed on two rational numbers with rational probabilities. Our result is very general: we show that it is hard to compute any encoding of an optimal auction of any format (direct or indirect, truthful or non-truthful) that can be implemented in expected polynomial time. In particular, under well-believed complexity-theoretic assumptions, revenue-optimization in very simple multi-item settings can only be tractably approximated. We note that our hardness result applies to randomized mechanisms in a very simple setting, and is not an artifact of introducing combinatorial structure to the problem by allowing correlation among item values, introducing combinatorial valuations, or requiring the mechanism to be deterministic (whose structure is readily combinatorial). Our proof is enabled by a flow-interpretation of the solutions of an exponential-size linear program for revenue maximization with an additional supermodularity constraint.

FOCS Conference 2013 Conference Paper

Learning Sums of Independent Integer Random Variables

  • Constantinos Daskalakis
  • Ilias Diakonikolas
  • Ryan O'Donnell
  • Rocco A. Servedio
  • Li-Yang Tan

Let bS = bX_1 + ·s + bX_n be a sum of n independent integer random variables bX_i, where each bX_i is supported on 0, 1, ·, k-1 but otherwise may have an arbitrary distribution (in particular the bX_i's need not be identically distributed). How many samples are required to learn the distribution bS to high accuracy? In this paper we show that the answer is completely independent of n, and moreover we give a computationally efficient algorithm which achieves this low sample complexity. More precisely, our algorithm learns any such bS to ε-accuracy (with respect to the total variation distance between distributions) using poly(k, 1/ε) samples, independent of n. Its running time is poly(k, 1/ε) in the standard word RAM model. Thus we give a broad generalization of the main result of DDS12stoc which gave a similar learning result for the special case k=2 (when the distribution bS is a Poisson Binomial Distribution). Prior to this work, no nontrivial results were known for learning these distributions even in the case k=3. A key difficulty is that, in contrast to the case of k = 2, sums of independent 0, 1, 2-valued random variables may behave very differently from (discretized) normal distributions, and in fact may be rather complicated - they are not log-concave, they can be θ(n)-modal, there is no relationship between Kolmogorov distance and total variation distance for the class, etc. Nevertheless, the heart of our learning result is a new limit theorem which characterizes what the sum of an arbitrary number of arbitrary independent 0, 1, ·, k-1-valued random variables may look like. Previous limit theorems in this setting made strong assumptions on the "shift invariance" of the random variables bX_i in order to force a discretized normal limit. We believe that our new limit theorem, as the first result for truly arbitrary sums of independent 0, 1, ·, k-1-valued random variables, is of independent interest.

SODA Conference 2013 Conference Paper

Optimal and Efficient Parametric Auctions

  • Pablo Daniel Azar
  • Constantinos Daskalakis
  • Silvio Micali
  • S. Matthew Weinberg

Consider a seller who seeks to provide service to a collection of interested parties, subject to feasibility constraints on which parties may be simultaneously served. Assuming that a distribution is known on the value of each party for service—arguably a strong assumption—Myerson's seminal work provides revenue optimizing auctions [12]. We show instead that, for very general feasibility constraints, only knowledge of the median of each party's value distribution, or any other quantile of these distributions, or approximations thereof, suffice for designing simple auctions that simultaneously approximate both the optimal revenue and the optimal welfare. Our results apply to all downward-closed feasibility constraints under the assumption that the underlying, unknown value distributions are monotone hazard rate, and to all matroid feasibility constraints under the weaker assumption of regularity of the underlying distributions. Our results jointly generalize the single-item results obtained by Azar and Micali [2] on parametric auctions, and Daskalakis and Pierrakos [6] for simultaneously approximately optimal and efficient auctions.

SODA Conference 2013 Conference Paper

Testing k -Modal Distributions: Optimal Algorithms via Reductions

  • Constantinos Daskalakis
  • Ilias Diakonikolas
  • Rocco A. Servedio
  • Gregory Valiant
  • Paul Valiant

We give highly efficient algorithms, and almost matching lower bounds, for a range of basic statistical problems that involve testing and estimating the L 1 (total variation) distance between two k -modal distributions p and q over the discrete domain {1, …, n }. More precisely, we consider the following four problems: given sample access to an unknown k -modal distribution p, T esting identity to a known or unknown distribution: 1. Determine whether p = q (for an explicitly given k -modal distribution q ) versus p is e-far from q; 2. Determine whether p = q (where q is available via sample access) versus p is ε-far from q; E stimating L 1 distance (“ tolerant testing ”) against a known or unknown distribution: 3. Approximate d TV ( p, q ) to within additive ε where q is an explicitly given k -modal distribution q; 4. Approximate d TV ( p, q ) to within additive ε where q is available via sample access. For each of these four problems we give sub-logarithmic sample algorithms, and show that our algorithms have optimal sample complexity up to additive poly ( k ) and multiplicative polylog log n + polylog k factors. Our algorithms significantly improve the previous results of [BKR04], which were for testing identity of distributions (items (1) and (2) above) in the special cases k = 0 (monotone distributions) and k = 1 (unimodal distributions) and required O ((log n ) 3 ) samples. As our main conceptual contribution, we introduce a new reduction-based approach for distribution-testing problems that lets us obtain all the above results in a unified way. Roughly speaking, this approach enables us to transform various distribution testing problems for k -modal distributions over {1, …, n } to the corresponding distribution testing problems for unrestricted distributions over a much smaller domain {1, …, ℓ} where ℓ = O ( k log n ).

FOCS Conference 2013 Conference Paper

Understanding Incentives: Mechanism Design Becomes Algorithm Design

  • Yang Cai 0001
  • Constantinos Daskalakis
  • S. Matthew Weinberg

We provide a computationally efficient black-box reduction from mechanism design to algorithm design in very general settings. Specifically, we give an approximation-preserving reduction from truthfully maximizing any objective under arbitrary feasibility constraints with arbitrary bidder types to (not necessarily truthfully) maximizing the same objective plus virtual welfare (under the same feasibility constraints). Our reduction is based on a fundamentally new approach: we describe a mechanism's behavior indirectly only in terms of the expected value it awards bidders for certain behavior, and never directly access the allocation rule at all. Applying our new approach to revenue, we exhibit settings where our reduction holds both ways. That is, we also provide an approximation-sensitive reduction from (non-truthfully) maximizing virtual welfare to (truthfully) maximizing revenue, and therefore the two problems are computationally equivalent. With this equivalence in hand, we show that both problems are NP-hard to approximate within any polynomial factor, even for a single monotone sub modular bidder. We further demonstrate the applicability of our reduction by providing a truthful mechanism maximizing fractional max-min fairness.

STOC Conference 2012 Conference Paper

An algorithmic characterization of multi-dimensional mechanisms

  • Yang Cai 0001
  • Constantinos Daskalakis
  • S. Matthew Weinberg

We show that every feasible, Bayesian, multi-item multi-bidder mechanism for independent, additive bidders can be implemented as a mechanism that: (a) allocates every item independently of the other items; (b) for the allocation of each item it uses a strict ordering of all bidders' types; and allocates the item using a distribution over hierarchical mechanisms that iron this ordering into a non-strict ordering, and give the item uniformly at random to the bidders whose reported types dominate all other reported types according to the non-strict ordering. Combined with cyclic-monotonicity our results provide a characterization of feasible, Bayesian Incentive Compatible mechanisms in this setting. Our characterization is enabled by a new, constructive proof of Border's theorem [Border 1991], and a new generalization of this theorem to independent (but not necessarily identically distributed) bidders, improving upon the results of [Border 2007, Che-Kim-Mierendorf 2011]. For a single item and independent bidders, we show that every feasible reduced form auction can be implemented as a distribution over hierarchical mechanisms that are consistent with the same strict ordering of all bidders' types, which every mechanism in the support of the distribution irons to a non-strict ordering. We also give a polynomial-time algorithm for determining feasibility of a reduced form auction, or providing a separation hyperplane from the set of feasible reduced forms. To complete the picture, we provide polynomial-time algorithms to find and exactly sample from a distribution over hierarchical mechanisms consistent with a given feasible reduced form. All these results generalize to multi-item reduced form auctions for independent, additive bidders. Finally, for multiple items, additive bidders with hard demand constraints, and arbitrary value correlation across items or bidders, we give a proper generalization of Border's Theorem, and characterize feasible reduced form auctions as multi-commodity flows in related multi-commodity flow instances. We also show that our generalization holds for a broader class of feasibility constraints, including the intersection of any two matroids. As a corollary of our results we compute revenue-optimal, Bayesian Incentive Compatible (BIC) mechanisms in multi-item multi-bidder settings, when each bidder has arbitrarily correlated values over the items and additive valuations over bundles of items, and the bidders are independent. Our mechanisms run in time polynomial in the total number of bidder types (and {not} type profiles). This running time is polynomial in the number of bidders, but potentially exponential in the number of items. We improve the running time to polynomial in both the number of items and the number of bidders by using recent structural results on optimal BIC auctions in item-symmetric settings [Daskalakis-Weinberg 2011].

STOC Conference 2012 Conference Paper

Learning poisson binomial distributions

  • Constantinos Daskalakis
  • Ilias Diakonikolas
  • Rocco A. Servedio

We consider a basic problem in unsupervised learning: learning an unknown Poisson Binomial Distribution . A Poisson Binomial Distribution (PBD) over {0,1,...,n} is the distribution of a sum of n independent Bernoulli random variables which may have arbitrary, potentially non-equal, expectations. These distributions were first studied by S. Poisson in 1837 and are a natural n-parameter generalization of the familiar Binomial Distribution. Surprisingly, prior to our work this basic learning problem was poorly understood, and known results for it were far from optimal. We essentially settle the complexity of the learning problem for this basic class of distributions. As our main result we give a highly efficient algorithm which learns to ε-accuracy using O(1/ε 3 ) samples independent of n . The running time of the algorithm is quasilinear in the size of its input data, i.e. ~O(log(n)/ε 3 ) bit-operations (observe that each draw from the distribution is a log(n)-bit string). This is nearly optimal since any algorithm must use Ω(1/ε 2 ) samples. We also give positive and negative results for some extensions of this learning problem.

FOCS Conference 2012 Conference Paper

Optimal Multi-dimensional Mechanism Design: Reducing Revenue to Welfare Maximization

  • Yang Cai 0001
  • Constantinos Daskalakis
  • S. Matthew Weinberg

We provide a reduction from revenue maximization to welfare maximization in multidimensional Bayesian auctions with arbitrary - possibly combinatorial - feasibility constraints and independent bidders with arbitrary - possibly combinatorial-demand constraints, appropriately extending Myerson's single-dimensional result [21] to this setting. We also show that every feasible Bayesian auction - including in particular the revenue-optimal one - can be implemented as a distribution over virtual VCG allocation rules. A virtual VCG allocation rule has the following simple form: Every bidder's type ti is transformed into a virtual type fi(ti), via a bidder-specific function. Then, the allocation maximizing virtual welfare is chosen. Using this characterization, we show how to find and run the revenue-optimal auction given only black-box access to an implementation of the VCG allocation rule. We generalize this result to arbitrarily correlated bidders, introducing the notion of a second-order VCG allocation rule. Our results are computationally efficient for all multidimensional settings where the bidders are additive, or can be efficiently mapped to be additive, albeit the feasibility and demand constraints may still remain arbitrary combinatorial. In this case, our mechanisms run in time polynomial in the number of items and the total number of bidder types, but not type profiles. This is polynomial in the number of items, the number of bidders, and the cardinality of the support of each bidder's value distribution. For generic correlated distributions, this is the natural description complexity of the problem. The runtime can be further improved to polynomial in only the number of items and the number of bidders in itemsymmetric settings by making use of techniques from [15].

SODA Conference 2011 Conference Paper

Continuous Local Search

  • Constantinos Daskalakis
  • Christos H. Papadimitriou

We introduce CLS, for continuous local search, a class of polynomial-time checkable total functions that lies at the intersection of PPAD and PLS, and captures a particularly benign kind of local optimization in which the domain is continuous, as opposed to combinatorial, and the functions involved are continuous. We show that this class contains several well known intriguing problems which were heretofore known to lie in the intersection of PLS and PPAD but were otherwise unclassifiable: Finding fixpoints of contraction maps, the linear complementarity problem for P matrices, finding a stationary point of a low-degree polynomial objective, the simple stochastic games of Shapley and Condon, and finding a mixed Nash equilibrium in congestion, implicit congestion, and network coordination games. The last four problems belong to CCLS, for convex CLS, another subclass of PPAD ∩ PLS seeking the componentwise local minimum of a componentwise convex function. It is open whether any or all of these problems are complete for the corresponding classes.

FOCS Conference 2011 Conference Paper

Extreme-Value Theorems for Optimal Multidimensional Pricing

  • Yang Cai 0001
  • Constantinos Daskalakis

We provide a Polynomial Time Approximation Scheme for the multi-dimensional unit-demand pricing problem, when the buyer's values are independent (but not necessarily identically distributed.) For all ϵ >; 0, we obtain a (1 + ϵ)-factor approximation to the optimal revenue in time polynomial, when the values are sampled from Monotone Hazard Rate (MHR) distributions, quasi-polynomial, when sampled from regular distributions, and polynomial in n poly(log r) when sampled from general distributions supported on a set [u min, ru min ]. We also provide an additive PTAS for all bounded distributions. Our algorithms are based on novel extreme value theorems for MHR and regular distributions, and apply probabilistic techniques to understand the statistical properties of revenue distributions, as well as to reduce the size of the search space of the algorithm. As a byproduct of our techniques, we establish structural properties of optimal solutions. We show that, for all ϵ >; 0, g(1/ϵ) distinct prices suffice to obtain a (1 + ϵ)-factor approximation to the optimal revenue for MHR distributions, where g(1/ϵ) is a quasi-linear function of 1/ϵ that does not depend on the number of items. Similarly, for all ϵ >; 0 and n >; 0, g(1/ϵ · log n) distinct prices suffice for regular distributions, where n is the number of items and g(·) is a polynomial function. Finally, in the i. i. d. MHR case, we show that, as long as the number of items is a sufficiently large function of 1/ϵ, a single price suffices to achieve a (1 + ϵ)-factor approximation. Our results represent significant progress to the single-bidder case of the multidimensional optimal mechanism design problem, following Myerson's celebrated work on optimal mechanism design [Myerson 1981].

SODA Conference 2011 Conference Paper

Near-Optimal No-Regret Algorithms for Zero-Sum Games

  • Constantinos Daskalakis
  • Alan Deckelbaum
  • Anthony Kim

We propose a new no-regret learning algorithm. When used against an adversary, our algorithm achieves average regret that scales as with the number T of rounds. This regret bound is optimal but not rare, as there are a multitude of learning algorithms with this regret guarantee. However, when our algorithm is used by both players of a zero-sum game, their average regret scales as, guaranteeing a near-linear rate of convergence to the value of the game. This represents an almost-quadratic improvement on the rate of convergence to the value of a game known to be achieved by any no-regret learning algorithm, and is essentially optimal as we show a lower bound of. Moreover, the dynamics produced by our algorithm in the game setting are strongly-uncoupled in that each player is oblivious to the payoff matrix of the game and the number of strategies of the other player, has limited private storage, and is not allowed funny bit arithmetic that can trivialize the problem; instead he only observes the performance of his strategies against the actions of the other player and can use private storage to remember past played strategies and observed payoffs, or cumulative information thereof. Here, too, our rate of convergence is nearly-optimal and represents an almost-quadratic improvement over the best previously known strongly-uncoupled dynamics.

TCS Journal 2009 Journal Article

A note on approximate Nash equilibria

  • Constantinos Daskalakis
  • Aranyak Mehta
  • Christos Papadimitriou

In view of the intractability of finding a Nash equilibrium, it is important to understand the limits of approximation in this context. A subexponential approximation scheme is known [Richard J. Lipton, Evangelos Markakis, Aranyak Mehta, Playing large games using simple strategies, in: EC, 2003], and no approximation better than 1 4 is possible by any algorithm that examines equilibria involving fewer than log n strategies [Ingo Althöfer, On sparse approximations to randomized strategies and convex combinations, Linear Algebra and its Applications (1994) 199]. We give a simple, linear-time algorithm examining just two strategies per player and resulting in a 1 2 -approximate Nash equilibrium in any 2-player game. For the more demanding notion of approximately well supported Nash equilibrium due to [Constantinos Daskalakis, Paul W. Goldberg, Christos H. Papadimitriou, The complexity of computing a Nash equilibrium, SIAM Journal on Computing (in press) Preliminary version appeared in STOC (2006)] no nontrivial bound is known; we show that the problem can be reduced to the case of win-lose games (games with all utilities 0 or 1 ), and that an approximation of 5 6 is possible, contingent upon a graph-theoretic conjecture. Subsequent work extends the 1 4 impossibility result of Ingo Althöfer’s paper, as mentioned above, to 1 2 [Tomás Feder, Hamid Nazerzadeh, Amin Saberi, Approximating nash equilibria using small-support strategies, in: EC, 2007], making our 1 2 -approximate Nash equilibrium algorithm optimal among the algorithms that only consider mixed strategies of sublogarithmic size support. Moreover, techniques similar to our techniques for approximately well supported Nash equilibria are used in [Spyros Kontogiannis, Paul G. Spirakis, Efficient algorithms for constant well supported approximate equilibria in bimatrix games, in: ICALP, 2007] for obtaining an efficient algorithm for 0. 658-approximately well supported Nash equilibria, unconditionally.

SODA Conference 2009 Conference Paper

Sorting and selection in posets

  • Constantinos Daskalakis
  • Richard M. Karp
  • Elchanan Mossel
  • Samantha J. Riesenfeld
  • Elad Verbin

Classical problems of sorting and searching assume an underlying linear ordering of the objects being compared. In this paper, we study these problems in the context of partially ordered sets, in which some pairs of objects are incomparable. This generalization is interesting from a combinatorial perspective, and it has immediate applications in ranking scenarios where there is no underlying linear ordering, e. g. , conference submissions. It also has applications in reconstructing certain types of networks, including biological networks. Our results represent significant progress over previous results from two decades ago by Faigle and Turán. In particular, we present the first algorithm that sorts a width- w poset of size n with optimal query complexity O ( n ( w + log n )). We also describe a variant of Mergesort with query complexity and total complexity; an algorithm with the same query complexity was given by Faigle and Turán, but no efficient implementation of that algorithm is known. Both our sorting algorithms can be applied with negligible overhead to the more general problem of reconstructing transitive relations. We also consider two related problems: finding the minimal elements, and its generalization to finding the bottom k “levels”, called the k-selection problem. We give efficient deterministic and randomized algorithms for finding the minimal elements with O ( wn ) query and total complexity. We provide matching lower bounds for the query complexity up to a factor of 2 and generalize the results to the k -selection problem. Finally, we present efficient algorithms for computing a linear extension of a poset and computing the heights of all elements.

FOCS Conference 2008 Conference Paper

Discretized Multinomial Distributions and Nash Equilibria in Anonymous Games

  • Constantinos Daskalakis
  • Christos H. Papadimitriou

We show that there is a polynomial-time approximation scheme for computing Nash equilibria in anonymous games with any fixed number of strategies (a very broad and important class of games), extending the two-strategy result of Daskalakis and Papadimitriou 2007. The approximation guarantee follows from a probabilistic result of more general interest: The distribution of the sum of n independent unit vectors with values ranging over {e 1, .. ., ek}, where e i is the unit vector along dimension i of the k-dimensional Euclidean space, can be approximated by the distribution of the sum of another set of independent unit vectors whose probabilities of obtaining each value are multiples of 1/z for some integer z, and so that the variational distance of the two distributions is at most eps, where eps is bounded by an inverse polynomial in z and a function of k, but with no dependence on n. Our probabilistic result specifies the construction of a surprisingly sparse epsi-cover- under the total variation distance - of the set of distributions of sums of independent unit vectors, which is of interest on its own right.

FOCS Conference 2007 Conference Paper

Computing Equilibria in Anonymous Games

  • Constantinos Daskalakis
  • Christos H. Papadimitriou

We present efficient approximation algorithms for finding Nash equilibria in anonymous games, that is, games in which the players utilities, though different, do not differentiate between other players. Our results pertain to such games with many players but few strategies. We show that any such game has an approximate pure Nash equilibrium, computable in polynomial time, with approximation O(s 2 lambda), where s is the number of strategies and lambda is the Lipschitz constant of the utilities. Finally, we show that there is a PTAS for finding an isin-approximate Nash equilibrium when the number of strategies is two.

STOC Conference 2006 Conference Paper

The complexity of computing a Nash equilibrium

  • Constantinos Daskalakis
  • Paul W. Goldberg
  • Christos H. Papadimitriou

We resolve the question of the complexity of Nash equilibrium by showing that the problem of computing a Nash equilibrium in a game with 4 or more players is complete for the complexity class PPAD. Our proof uses ideas from the recently-established equivalence between polynomial time solvability of normal form games and graphical games, establishing that these kinds of games can simulate a PPAD-complete class of Brouwer functions.