Author name cluster

Themis Gouleakis

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

ICML Conference 2025 Conference Paper

Learning multivariate Gaussians with imperfect advice

Arnab Bhattacharyya 0001
Davin Choo
Philips George John
Themis Gouleakis

We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing standard learning lower bounds when the advice is sufficiently accurate. Specifically, we demonstrate that this outcome is achievable for the problem of learning a multivariate Gaussian distribution $N(\mu, \Sigma)$ in the PAC learning setting. Classically, in the advice-free setting, $\widetilde{\Theta}(d^2/\varepsilon^2)$ samples are sufficient and worst case necessary to learn $d$-dimensional Gaussians up to TV distance $\varepsilon$ with constant probability. When we are additionally given a parameter $\widetilde{\Sigma}$ as advice, we show that $\widetilde{\mathcal{O}}(d^{2-\beta}/\varepsilon^2)$ samples suffices whenever $|| \widetilde{\Sigma}^{-1/2} \Sigma \widetilde{\Sigma}^{-1/2} - I_d ||_1 \leq \varepsilon d^{1-\beta}$ (where $||\cdot||_1$ denotes the entrywise $\ell_1$ norm) for any $\beta > 0$, yielding a polynomial improvement over the advice-free setting.

Details

NeurIPS Conference 2025 Conference Paper

Product Distribution Learning with Imperfect Advice

Arnab Bhattacharyya
Choo, XianJun, Davin
Philips George John
Themis Gouleakis

Given i. i. d. ~samples from an unknown distribution $P$, the goal of distribution learning is to recover the parameters of a distribution that is close to $P$. When $P$ belongs to the class of product distributions on the Boolean hypercube $\{0, 1\}^d$, it is known that $\Omega(d/\epsilon^2)$ samples are necessary to learn $P$ within total variation (TV) distance $\epsilon$. We revisit this problem when the learner is also given as advice the parameters of a product distribution $Q$. We show that there is an efficient algorithm to learn $P$ within TV distance $\epsilon$ that has sample complexity $\tilde{O}(d^{1-\eta}/\epsilon^2)$, if $\|\mathbf{p} - \mathbf{q}\|_1<\epsilon d^{0. 5 - \Omega(\eta)}$. Here, $\mathbf{p}$ and $\mathbf{q}$ are the mean vectors of $P$ and $Q$ respectively, and no bound on $\|\mathbf{p} - \mathbf{q}\|_1$ is known to the algorithm a priori.

PDF Details

ICML Conference 2024 Conference Paper

Online bipartite matching with imperfect advice

Davin Choo
Themis Gouleakis
Chun Kai Ling
Arnab Bhattacharyya 0001

We study the problem of online unweighted bipartite matching with $n$ offline vertices and $n$ online vertices where one wishes to be competitive against the optimal offline algorithm. While the classic RANKING algorithm of (Karp et al. , 1990) provably attains competitive ratio of $1-1/e > 1/2$, we show that no learning-augmented method can be both 1-consistent and strictly better than 1/2-robust under the adversarial arrival model. Meanwhile, under the random arrival model, we show how one can utilize methods from distribution testing to design an algorithm that takes in external advice about the online vertices and provably achieves competitive ratio interpolating between any ratio attainable by advice-free methods and the optimal ratio of 1, depending on the advice quality.

Details

ICML Conference 2023 Conference Paper

Active causal structure learning with advice

Davin Choo
Themis Gouleakis
Arnab Bhattacharyya 0001

We introduce the problem of active causal structure learning with advice. In the typical well-studied setting, the learning algorithm is given the essential graph for the observational distribution and is asked to recover the underlying causal directed acyclic graph (DAG) $G^*$ while minimizing the number of interventions made. In our setting, we are additionally given side information about $G^*$ as advice, e. g. a DAG $G$ purported to be $G^*$. We ask whether the learning algorithm can benefit from the advice when it is close to being correct, while still having worst-case guarantees even when the advice is arbitrarily bad. Our work is in the same space as the growing body of research on algorithms with predictions. When the advice is a DAG $G$, we design an adaptive search algorithm to recover $G^*$ whose intervention cost is at most $\mathcal{O}(\max\{1, \log \psi\})$ times the cost for verifying $G^*$; here, $\psi$ is a distance measure between $G$ and $G^*$ that is upper bounded by the number of variables $n$, and is exactly 0 when $G=G^*$. Our approximation factor matches the state-of-the-art for the advice-less setting.

Details

STOC Conference 2021 Conference Paper

Optimal testing of discrete distributions with high probability

Ilias Diakonikolas
Themis Gouleakis
Daniel M. Kane
John Peebles
Eric Price 0001

Details

NeurIPS Conference 2020 Conference Paper

Secretary and Online Matching Problems with Machine Learned Advice

Antonios Antoniadis
Themis Gouleakis
Pieter Kleer
Pavel Kolev

The classical analysis of online algorithms, due to its worst-case nature, can be quite pessimistic when the input instance at hand is far from worst-case. Often this is not an issue with machine learning approaches, which shine in exploiting patterns in past inputs in order to predict the future. However, such predictions, although usually accurate, can be arbitrarily poor. Inspired by a recent line of work, we augment three well-known online settings with machine learned predictions about the future, and develop algorithms that take them into account. In particular, we study the following online selection problems: (i) the classical secretary problem, (ii) online bipartite matching and (iii) the graphic matroid secretary problem. Our algorithms still come with a worst-case performance guarantee in the case that predictions are subpar while obtaining an improved competitive ratio (over the best-known classical online algorithm for each problem) when the predictions are sufficiently accurate. For each algorithm, we establish a trade-off between the competitive ratios obtained in the two respective cases.

PDF Details

NeurIPS Conference 2019 Conference Paper

Distribution-Independent PAC Learning of Halfspaces with Massart Noise

Ilias Diakonikolas
Themis Gouleakis
Christos Tzamos

We study the problem of {\em distribution-independent} PAC learning of halfspaces in the presence of Massart noise. Specifically, we are given a set of labeled examples $(\bx, y)$ drawn from a distribution $\D$ on $\R^{d+1}$ such that the marginal distribution on the unlabeled points $\bx$ is arbitrary and the labels $y$ are generated by an unknown halfspace corrupted with Massart noise at noise rate $\eta<1/2$. The goal is to find a hypothesis $h$ that minimizes the misclassification error $\pr_{(\bx, y) \sim \D} \left[ h(\bx) \neq y \right]$. We give a $\poly\left(d, 1/\eps\right)$ time algorithm for this problem with misclassification error $\eta+\eps$. We also provide evidence that improving on the error guarantee of our algorithm might be computationally hard. Prior to our work, no efficient weak (distribution-independent) learner was known in this model, even for the class of disjunctions. The existence of such an algorithm for halfspaces (or even disjunctions) has been posed as an open question in various works, starting with Sloan (1988), Cohen (1997), and was most recently highlighted in Avrim Blum's FOCS 2003 tutorial.

PDF Details

FOCS Conference 2018 Conference Paper

Efficient Statistics, in High Dimensions, from Truncated Samples

Constantinos Daskalakis
Themis Gouleakis
Christos Tzamos
Manolis Zampetakis

We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal N(mu, Sigma) means a samples is only revealed if it falls in some subset S of the d-dimensional Euclidean space; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean mu and covariance matrix Sigma can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible.

Details

SODA Conference 2017 Conference Paper

Faster Sublinear Algorithms using Conditional Sampling

Themis Gouleakis
Christos Tzamos
Manolis Zampetakis

Details