Author name cluster

Uri Stemmer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers

2 author rows

ICML Conference 2025 Conference Paper

Breaking the Quadratic Barrier: Robust Cardinality Sketches for Adaptive Queries

Edith Cohen
Mihir Singhal
Uri Stemmer

Cardinality sketches are compact data structures that efficiently estimate the number of distinct elements across multiple queries while minimizing storage, communication, and computational costs. However, recent research has shown that these sketches can fail under adaptively chosen queries, breaking down after approximately $\tilde{O}(k^2)$ queries, where $k$ is the sketch size. In this work, we overcome this quadratic barrier by designing robust estimators with fine-grained guarantees. Specifically, our constructions can handle an exponential number of adaptive queries, provided that each element participates in at most $\tilde{O}(k^2)$ queries. This effectively shifts the quadratic barrier from the total number of queries to the number of queries sharing the same element, which can be significantly smaller. Beyond cardinality sketches, our approach expands the toolkit for robust algorithm design.

ICML Conference 2025 Conference Paper

Nearly Optimal Sample Complexity for Learning with Label Proportions

Róbert Busa-Fekete
Travis Dick
Claudio Gentile
Haim Kaplan
Tomer Koren
Uri Stemmer

We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags, and only aggregate label values in each bag are available. Despite the partial observability, the goal is still to achieve small regret at the level of individual examples. We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal. From an algorithmic viewpoint, we rely on carefully designed variants of Empirical Risk Minimization, and Stochastic Gradient Descent algorithms, combined with ad hoc variance reduction techniques. On one hand, our theoretical results improve in important ways on the existing literature on LLP, specifically in the way the sample complexity depends on the bag size. On the other hand, we validate our algorithmic solutions on several datasets, demonstrating improved empirical performance (better accuracy for less samples) against recent baselines.

STOC Conference 2025 Conference Paper

On Differentially Private Linear Algebra

Haim Kaplan
Yishay Mansour
Shay Moran
Uri Stemmer
Nitzan Tur

We introduce efficient differentially private (DP) algorithms for several linear algebraic tasks, including solving linear equalities over arbitrary fields, linear inequalities over the reals, and computing affine spans and convex hulls. As an application, we obtain efficient DP algorithms for learning halfspaces and affine subspaces. Our algorithms addressing equalities are strongly polynomial, whereas those addressing inequalities are weakly polynomial. Furthermore, this distinction is inevitable: no DP algorithm for linear programming can be strongly polynomial-time efficient.

NeurIPS Conference 2025 Conference Paper

Private Set Union with Multiple Contributions

Travis Dick
Haim Kaplan
Alex Kulesza
Uri Stemmer
Ziteng Sun
Ananda Theertha Suresh

In the private set union problem each user owns a bag of at most $k$ items (from some large universe of items), and we are interested in computing the union of the items in the bags of all of the users. This is trivial without privacy, but a differentially private algorithm must be careful about reporting items contained in only a small number of bags. We consider differentially private algorithms that always report a subset of the union, and define the utility of an algorithm to be the expected size of the subset that it reports. Because the achievable utility varies significantly with the dataset, we introduce the *utility ratio*, which normalizes utility by a dataset-specific upper bound and characterizes a mechanism by its lowest normalized utility across all datasets. We then develop algorithms with guaranteed utility ratios and complement them with bounds on the best possible utility ratio. Prior work has shown that a single algorithm can be simultaneously optimal for all datasets when $k=1$, but we show that instance-optimal algorithms do not exist when $k>1$, and characterize how performance degrades as $k$ grows. At the same time, we design a private algorithm that achieves the maximum possible utility, regardless of $k$, when the item histogram matches a prior prediction (for instance, from a previous data release) and degrades gracefully with the $L_\infty$ distance between the prediction and the actual histogram when the prediction is imperfect.

NeurIPS Conference 2025 Conference Paper

The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation

Sara Ahmadian
Edith Cohen
Uri Stemmer

Dimensionality reduction via linear sketching is a powerful and widely used technique, but it is known to be vulnerable to adversarial inputs. We study the \emph{black-box adversarial setting}, where a fixed, hidden sketching matrix $A \in \mathbb{R}^{k \times n}$ maps high-dimensional vectors $\boldsymbol{v} \in \mathbb{R}^n$ to lower-dimensional sketches $A\boldsymbol{v} \in \mathbb{R}^k$, and an adversary can query the system to obtain approximate $\ell_2$-norm estimates that are computed from the sketch. We present a \emph{universal, nonadaptive attack} that, using $\tilde{O}(k^2)$ queries, either causes a failure in norm estimation or constructs an adversarial input on which the optimal estimator for the query distribution (used by the attack) fails. The attack is completely agnostic to the sketching matrix and to the estimator—it applies to \emph{any} linear sketch and \emph{any} query responder, including those that are randomized, adaptive, or tailored to the query distribution. Our lower bound construction tightly matches the known upper bounds of $\tilde{\Omega}(k^2)$, achieved by specialized estimators for Johnson–Lindenstrauss transforms and AMS sketches. Beyond sketching, our results uncover structural parallels to adversarial attacks in image classification, highlighting fundamental vulnerabilities of compressed representations.

NeurIPS Conference 2025 Conference Paper

Tight Bounds for Answering Adaptively Chosen Concentrated Queries

Emma Rapoport
Edith Cohen
Uri Stemmer

Most work on adaptive data analysis assumes that samples in the dataset are independent. When correlations are allowed, even the non-adaptive setting can become intractable, unless some structural constraints are imposed. To address this, Bassily and Freund [2016] introduced the elegant framework of *concentrated queries*, which requires the analyst to restrict itself to queries that are concentrated around their expected value. While this assumption makes the problem trivial in the non-adaptive setting, in the adaptive setting it remains quite challenging. In fact, all known algorithms in this framework support significantly fewer queries than in the independent case: At most $O(n)$ queries for a sample of size $n$, compared to $O(n^2)$ in the independent setting. In this work, we prove that this utility gap is inherent under the current formulation of the concentrated queries framework, assuming some natural conditions on the algorithm. Additionally, we present a simplified version of the best-known algorithms that match our impossibility result.

ICML Conference 2024 Conference Paper

Private Truly-Everlasting Robust-Prediction

Uri Stemmer

Private everlasting prediction (PEP), recently introduced by Naor et al. [2023], is a model for differentially private learning in which the learner never publicly releases a hypothesis. Instead, it provides black-box access to a "prediction oracle" that can predict the labels of an endless stream of unlabeled examples drawn from the underlying distribution. Importantly, PEP provides privacy both for the initial training set and for the endless stream of classification queries. We present two conceptual modifications to the definition of PEP, as well as new constructions exhibiting significant improvements over prior work. Specifically, we incorporate robustness against poisoning attacks into the definition of PEP; we present a relaxed privacy definition, suitable for PEP, that allows us to disconnect the privacy parameter $\delta$ from the number of total time steps $T$; and we present new constructions for axis-aligned rectangles and decision-stumps exhibiting improved sample complexity and runtime.

NeurIPS Conference 2023 Conference Paper

Adaptive Data Analysis in a Balanced Adversarial Model

Kobbi Nissim
Uri Stemmer
Eliad Tsfadia

In adaptive data analysis, a mechanism gets $n$ i. i. d. samples from an unknown distribution $\cal{D}$, andis required to provide accurate estimations to a sequence of adaptively chosen statistical queries with respect to $\cal{D}$. Hardt and Ullman (FOCS 2014) and Steinke and Ullman (COLT 2015) showed that in general, it is computationally hard to answer more than $\Theta(n^2)$ adaptive queries, assuming the existence of one-way functions. However, these negative results strongly rely on an adversarial model that significantly advantages the adversarial analyst over the mechanism, as the analyst, who chooses the adaptive queries, also chooses the underlying distribution $\cal{D}$. This imbalance raises questions with respect to the applicability of the obtained hardness results -- an analyst who has complete knowledge of the underlying distribution $\cal{D}$ would have little need, if at all, to issue statistical queries to a mechanism which only holds a finite number of samples from $\cal{D}$. We consider more restricted adversaries, called \emph{balanced}, where each such adversary consists of two separated algorithms: The \emph{sampler} who is the entity that chooses the distribution and provides the samples to the mechanism, and the \emph{analyst} who chooses the adaptive queries, but has no prior knowledge of the underlying distribution (and hence has no a priori advantage with respect to the mechanism). We improve the quality of previous lower bounds by revisiting them using an efficient \emph{balanced} adversary, under standard public-key cryptography assumptions. We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded \emph{balanced} adversary that has the structure of all known attacks, implies the existence of public-key cryptography.

NeurIPS Conference 2023 Conference Paper

Black-Box Differential Privacy for Interactive ML

Haim Kaplan
Yishay Mansour
Shay Moran
Kobbi Nissim
Uri Stemmer

In this work we revisit an interactive variant of joint differential privacy, recently introduced by Naor et al. [2023], and generalize it towards handling online processes in which existing privacy definitions seem too restrictive. We study basic properties of this definition and demonstrate that it satisfies (suitable variants) of group privacy, composition, and post processing. In order to demonstrate the advantages of this privacy definition compared to traditional forms of differential privacy, we consider the basic setting of online classification. We show that any (possibly non-private) learning rule can be effectively transformed to a private learning rule with only a polynomial overhead in the mistake bound. This demonstrates a stark difference with traditional forms of differential privacy, such as the one studied by Golowich and Livni [2021], where only a double exponential overhead in the mistake bound is known (via an information theoretic upper bound).

ICML Conference 2023 Conference Paper

Concurrent Shuffle Differential Privacy Under Continual Observation

Jay Tenenbaum
Haim Kaplan
Yishay Mansour
Uri Stemmer

We introduce the concurrent shuffle model of differential privacy. In this model we have multiple concurrent shufflers permuting messages from different, possibly overlapping, batches of users. Similarly to the standard (single) shuffler model, the privacy requirement is that the concatenation of all shuffled messages should be differentially private. We study the private continual summation problem (a. k. a. the counter problem) and show that the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffler model. Specifically, we give a summation algorithm with error $\tilde{O}(n^{1/(2k+1)})$ with $k$ concurrent shufflers on a sequence of length $n$. Furthermore, we prove that this bound is tight for any $k$, even if the algorithm can choose the sizes of the batches adaptively. For $k=\log n$ shufflers, the resulting error is polylogarithmic, much better than $\tilde{\Theta}(n^{1/3})$ which we show is the smallest possible with a single shuffler. We use our online summation algorithm to get algorithms with improved regret bounds for the contextual linear bandit problem. In particular we get optimal $\tilde{O}(\sqrt{n})$ regret with $k= \tilde{\Omega}(\log n)$ concurrent shufflers.

STOC Conference 2023 Conference Paper

Optimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization

Edith Cohen
Xin Lyu 0002
Jelani Nelson
Tamás Sarlós
Uri Stemmer

The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of O (ξ −1 log(1/β)) (for generalization error ξ with confidence 1−β). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size | X | of the domain. Progress on quantifying this dependence, via lower and upper bounds, was made in a line of works over the past decade. In this paper, we finally close the gap for approximate-DP and provide a nearly tight upper bound of O (log * | X |), which matches a lower bound by Alon et al (that applies even with improper learning) and improves over a prior upper bound of O ((log * | X |) 1.5 ) by Kaplan et al. We also provide matching upper and lower bounds of Θ(2 log * | X | ) for the additive error of private quasi-concave optimization (a related and more general problem). Our improvement is achieved via the novel Reorder-Slice-Compute paradigm for private data analysis which we believe will have further applications.

NeurIPS Conference 2023 Conference Paper

Private Everlasting Prediction

Moni Naor
Kobbi Nissim
Uri Stemmer
Chao Yan

A private learner is trained on a sample of labeled points and generates a hypothesis that can be used for predicting the labels of newly sampled points while protecting the privacy of the training set [Kasiviswannathan et al. , FOCS 2008]. Past research uncovered that private learners may need to exhibit significantly higher sample complexity than non-private learners as is the case of learning of one-dimensional threshold functions [Bun et al. , FOCS 2015, Alon et al. , STOC 2019]. We explore prediction as an alternative to learning. A predictor answers a stream of classification queries instead of outputting a hypothesis. Earlier work has considered a private prediction model with a single classification query [Dwork and Feldman, COLT 2018]. We observe that when answering a stream of queries, a predictor must modify the hypothesis it uses over time, and in a manner that cannot rely solely on the training set. We introduce {\em private everlasting prediction} taking into account the privacy of both the training set {\em and} the (adaptively chosen) queries made to the predictor. We then present a generic construction of private everlasting predictors in the PAC model. The sample complexity of the initial training sample in our construction is quadratic (up to polylog factors) in the VC dimension of the concept class. Our construction allows prediction for all concept classes with finite VC dimension, and in particular threshold functions over infinite domains, for which (traditional) private learning is known to be impossible.

AAAI Conference 2023 Conference Paper

Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of CountSketch to Adaptive Inputs

Edith Cohen
Jelani Nelson
Tamas Sarlos
Uri Stemmer

CountSketch and Feature Hashing (the ``hashing trick'') are popular randomized dimensionality reduction methods that support recovery of l2 -heavy hitters and approximate inner products. When the inputs are not adaptive (do not depend on prior outputs), classic estimators applied to a sketch of size O(l / epsilon) are accurate for a number of queries that is exponential in l. When inputs are adaptive, however, an adversarial input can be constructed after O(l) queries with the classic estimator and the best known robust estimator only supports ~O(l^2) queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after O(l^2) queries produces an adversarial input vector whose sketch is highly biased. Our attack uses ``natural'' non-adaptive inputs (only the final adversarial input is chosen adaptively) and universally applies with any correct estimator, including one that is unknown to the attacker. In that, we expose inherent vulnerability of this fundamental method.

PDF Details DOI

ICML Conference 2022 Conference Paper

Adaptive Data Analysis with Correlated Observations

Aryeh Kontorovich
Menachem Sadigurschi
Uri Stemmer

The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy, max-information, compression arguments, and more. The situation is far less well-understood without the independence assumption. We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. First, we show that, in some cases, differential privacy guarantees generalization even when there are dependencies within the sample, which we quantify using a notion we call Gibbs-dependence. We complement this result with a tight negative example. % Second, we show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting.

ICML Conference 2022 Conference Paper

Differentially Private Approximate Quantiles

Haim Kaplan
Shachar Schnapp
Uri Stemmer

In this work we study the problem of differentially private (DP) quantiles, in which given dataset $X$ and quantiles $q_1, .. ., q_m \in [0, 1]$, we want to output $m$ quantile estimations which are as close as possible to the true quantiles and preserve DP. We describe a simple recursive DP algorithm, which we call Approximate Quantiles (AQ), for this task. We give a worst case upper bound on its error, and show that its error is much lower than of previous implementations on several different datasets. Furthermore, it gets this low error while running time two orders of magnitude faster that the best previous implementation.

STOC Conference 2022 Conference Paper

Dynamic algorithms against an adaptive adversary: generic constructions and lower bounds

Amos Beimel
Haim Kaplan
Yishay Mansour
Kobbi Nissim
Thatchaphol Saranurak
Uri Stemmer

ICML Conference 2022 Conference Paper

FriendlyCore: Practical Differentially Private Aggregation

Eliad Tsfadia
Edith Cohen
Haim Kaplan
Yishay Mansour
Uri Stemmer

Differentially private algorithms for common metric aggregation tasks, such as clustering or averaging, often have limited practicality due to their complexity or to the large number of data points that is required for accurate results. We propose a simple and practical tool $\mathsf{FriendlyCore}$ that takes a set of points ${\cal D}$ from an unrestricted (pseudo) metric space as input. When ${\cal D}$ has effective diameter $r$, $\mathsf{FriendlyCore}$ returns a “stable” subset ${\cal C} \subseteq {\cal D}$ that includes all points, except possibly few outliers, and is guaranteed to have diameter $r$. $\mathsf{FriendlyCore}$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy. Surprisingly, $\mathsf{FriendlyCore}$ is light-weight with no dependence on the dimension. We empirically demonstrate its advantages in boosting the accuracy of mean estimation and clustering tasks such as $k$-means and $k$-GMM, outperforming tailored methods.

ICML Conference 2022 Conference Paper

On the Robustness of CountSketch to Adaptive Inputs

Edith Cohen
Xin Lyu 0002
Jelani Nelson
Tamás Sarlós
Moshe Shechner
Uri Stemmer

The last decade saw impressive progress towards understanding the performance of algorithms in adaptive settings, where subsequent inputs may depend on the output from prior inputs. Adaptive settings arise in processes with feedback or with adversarial attacks. Existing designs of robust algorithms are generic wrappers of non-robust counterparts and leave open the possibility of better tailored designs. The lowers bounds (attacks) are similarly worst-case and their significance to practical setting is unclear. Aiming to understand these questions, we study the robustness of \texttt{CountSketch}, a popular dimensionality reduction technique that maps vectors to a lower dimension using randomized linear measurements. The sketch supports recovering $\ell_2$-heavy hitters of a vector (entries with $v[i]^2 \geq \frac{1}{k}\|\boldsymbol{v}\|^2_2$). We show that the classic estimator is not robust, and can be attacked with a number of queries of the order of the sketch size. We propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries in the sketch size, which is an improvement factor of $\sqrt{k}$ (for $k$ heavy hitters) over prior "blackbox" approaches.

NeurIPS Conference 2021 Conference Paper

Differentially Private Multi-Armed Bandits in the Shuffle Model

Jay Tenenbaum
Haim Kaplan
Yishay Mansour
Uri Stemmer

We give an $(\varepsilon, \delta)$-differentially private algorithm for the Multi-Armed Bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $O\left(\left(\sum_{a: \Delta_a>0}\frac{\log T}{\Delta_a}\right)+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, and a distribution-independent regret of $O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, where $T$ is the number of rounds, $\Delta_a$ is the suboptimality gap of the action $a$, and $k$ is the total number of actions. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.

ICML Conference 2021 Conference Paper

Differentially-Private Clustering of Easy Instances

Edith Cohen
Haim Kaplan
Yishay Mansour
Uri Stemmer
Eliad Tsfadia

Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify k cluster centers without disclosing information on individual data points. Despite significant research progress, the problem had so far resisted practical solutions. In this work we aim at providing simple implementable differentrially private clustering algorithms when the the data is "easy, " e. g. , when there exists a significant separation between the clusters. For the easy instances we consider, we have a simple implementation based on utilizing non-private clustering algorithms, and combining them privately. We are able to get improved sample complexity bounds in some cases of Gaussian mixtures and k-means. We complement our theoretical algorithms with experiments of simulated data.

JMLR Journal 2021 Journal Article

Locally Private k-Means Clustering

Uri Stemmer

We design a new algorithm for the Euclidean $k$-means problem that operates in the local model of differential privacy. Unlike in the non-private literature, differentially private algorithms for the $k$-means objective incur both additive and multiplicative errors. Our algorithm significantly reduces the additive error while keeping the multiplicative error the same as in previous state-of-the-art results. Specifically, on a database of size $n$, our algorithm guarantees $O(1)$ multiplicative error and $\approx n^{1/2+a}$ additive error for an arbitrarily small constant $a>0$. All previous algorithms in the local model had additive error $\approx n^{2/3+a}$. Our techniques extend to $k$-median clustering. We show that the additive error we obtain is almost optimal in terms of its dependency on the database size $n$. Specifically, we give a simple lower bound showing that every locally-private algorithm for the $k$-means objective must have additive error at least $\approx\sqrt{n}$. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

NeurIPS Conference 2021 Conference Paper

On the Sample Complexity of Privately Learning Axis-Aligned Rectangles

Menachem Sadigurschi
Uri Stemmer

We revisit the fundamental problem of learning Axis-Aligned-Rectangles over a finite grid $X^d\subseteq\mathbb{R}^d$ with differential privacy. Existing results show that the sample complexity of this problem is at most $\min\left\{ d{\cdot}\log|X| \; ,\; d^{1. 5}{\cdot}\left(\log^*|X| \right)^{1. 5}\right\}$. That is, existing constructions either require sample complexity that grows linearly with $\log|X|$, or else it grows super linearly with the dimension $d$. We present a novel algorithm that reduces the sample complexity to only $\tilde{O}\left\{d{\cdot}\left(\log^*|X|\right)^{1. 5}\right\}$, attaining a dimensionality optimal dependency without requiring the sample complexity to grow with $\log|X|$. The technique used in order to attain this improvement involves the deletion of "exposed" data-points on the go, in a fashion designed to avoid the cost of the adaptive composition theorems. The core of this technique may be of individual interest, introducing a new method for constructing statistically-efficient private algorithms.

NeurIPS Conference 2020 Conference Paper

Adversarially Robust Streaming Algorithms via Differential Privacy

Avinatan Hasidim
Haim Kaplan
Yishay Mansour
Yossi Matias
Uri Stemmer

A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adversarially robust streaming algorithms that outperform the current state-of-the-art constructions for many interesting regimes of parameters.

SODA Conference 2020 Conference Paper

Locally Private k -Means Clustering

Uri Stemmer

We design a new algorithm for the Euclidean k -means problem that operates in the local model of differential privacy. Unlike in the non-private literature, differentially private algorithms for the k -means incur both additive and multiplicative errors. Our algorithm significantly reduces the additive error while keeping the multiplicative error the same as in previous state-of-the-art results. Specifically, on a database of size n, our algorithm guarantees O (1) multiplicative error and ≈ n 1/2 + α additive error for an arbitrarily small constant a > 0. All previous algorithms in the local model had additive error ≈ n 2/3+ a. We show that the additive error we obtain is almost optimal in terms of its dependency in the database size n. Specifically, we give a simple lower bound showing that every locally-private algorithm for the k -means must have additive error at least ≈.

JMLR Journal 2020 Journal Article

Practical Locally Private Heavy Hitters

Raef Bassily
Kobbi Nissim
Uri Stemmer
Abhradeep Thakurta

We present new practical local differentially private heavy hitters algorithms achieving optimal or near-optimal worst-case error and running time -- TreeHist and Bitstogram. In both algorithms, server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $O(n^{5/2})$ server time and $O(n^{3/2})$ user time. With a typically large number of participants in local algorithms (in the millions), this reduction in time complexity, in particular at the user side, is crucial for making locally private heavy hitters algorithms usable in practice. We implemented Algorithm TreeHist to verify our theoretical analysis and compared its performance with the performance of Google's RAPPOR code. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

NeurIPS Conference 2020 Conference Paper

Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity

Haim Kaplan
Yishay Mansour
Uri Stemmer
Eliad Tsfadia

We present a differentially private learner for halfspaces over a finite grid $G$ in $\R^d$ with sample complexity $\approx d^{2. 5}\cdot 2^{\log^*|G|}$, which improves the state-of-the-art result of [Beimel et al. , COLT 2019] by a $d^2$ factor. The building block for our learner is a new differentially private algorithm for approximately solving the linear feasibility problem: Given a feasible collection of $m$ linear constraints of the form $Ax\geq b$, the task is to {\em privately} identify a solution $x$ that satisfies {\em most} of the constraints. Our algorithm is iterative, where each iteration determines the next coordinate of the constructed solution $x$.

JMLR Journal 2019 Journal Article

Characterizing the Sample Complexity of Pure Private Learners

Amos Beimel
Kobbi Nissim
Uri Stemmer

Kasiviswanathan et al. (FOCS 2008) defined private learning as a combination of PAC learning and differential privacy. Informally, a private learner is applied to a collection of labeled individual information and outputs a hypothesis while preserving the privacy of each individual. Kasiviswanathan et al. left open the question of characterizing the sample complexity of private learners. We give a combinatorial characterization of the sample size sufficient and necessary to learn a class of concepts under pure differential privacy. This characterization is analogous to the well known characterization of the sample complexity of non-private learning in terms of the VC dimension of the concept class. We introduce the notion of probabilistic representation of a concept class, and our new complexity measure $RepDim$ corresponds to the size of the smallest probabilistic representation of the concept class. We show that any private learning algorithm for a concept class $C$ with sample complexity $m$ implies $RepDim(C)=O(m)$, and that there exists a private learning algorithm with sample complexity $m=O(RepDim(C))$. We further demonstrate that a similar characterization holds for the database size needed for computing a large class of optimization problems under pure differential privacy, and also for the well studied problem of private data release. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

ICML Conference 2019 Conference Paper

Differentially Private Learning of Geometric Concepts

Haim Kaplan
Yishay Mansour
Yossi Matias
Uri Stemmer

We present differentially private efficient algorithms for learning union of polygons in the plane (which are not necessarily convex). Our algorithms achieve $(\alpha, \beta)$-PAC learning and $(\epsilon, \delta)$-differential privacy using a sample of size $\tilde{O}\left(\frac{1}{\alpha\epsilon}k\log d\right)$, where the domain is $[d]\times[d]$ and $k$ is the number of edges in the union of polygons.

JMLR Journal 2019 Journal Article

Simultaneous Private Learning of Multiple Concepts

Mark Bun
Kobbi Nissim
Uri Stemmer

We investigate the direct-sum problem in the context of differentially private PAC learning: What is the sample complexity of solving $k$ learning tasks simultaneously under differential privacy, and how does this cost compare to that of solving $k$ learning tasks without privacy? In our setting, an individual example consists of a domain element $x$ labeled by $k$ unknown concepts $(c_1,\ldots,c_k)$. The goal of a multi-learner is to output $k$ hypotheses $(h_1,\ldots,h_k)$ that generalize the input examples. Without concern for privacy, the sample complexity needed to simultaneously learn $k$ concepts is essentially the same as needed for learning a single concept. Under differential privacy, the basic strategy of learning each hypothesis independently yields sample complexity that grows polynomially with $k$. For some concept classes, we give multi-learners that require fewer samples than the basic strategy. Unfortunately, however, we also give lower bounds showing that even for very simple concept classes, the sample cost of private multi-learning must grow polynomially in $k$. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

NeurIPS Conference 2018 Conference Paper

Differentially Private k-Means with Constant Multiplicative Error

Uri Stemmer
Haim Kaplan

We design new differentially private algorithms for the Euclidean k-means problem, both in the centralized model and in the local model of differential privacy. In both models, our algorithms achieve significantly improved error guarantees than the previous state-of-the-art. In addition, in the local model, our algorithm significantly reduces the number of interaction rounds. Although the problem has been widely studied in the context of differential privacy, all of the existing constructions achieve only super constant approximation factors. We present, for the first time, efficient private algorithms for the problem with constant multiplicative error. Furthermore, we show how to modify our algorithms so they compute private coresets for k-means clustering in both models.

NeurIPS Conference 2018 Conference Paper

The Limits of Post-Selection Generalization

Jonathan Ullman
Adam Smith
Kobbi Nissim
Uri Stemmer
Thomas Steinke

While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of post selection ---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure a property called post hoc generalization (Cummings et al. , COLT'16), which says that no person when given the output of the algorithm should be able to find any statistic for which the data differs significantly from the population it came from. In this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties.

NeurIPS Conference 2017 Conference Paper

Practical Locally Private Heavy Hitters

Raef Bassily
Kobbi Nissim
Uri Stemmer
Abhradeep Guha Thakurta

We present new practical local differentially private heavy hitters algorithms achieving optimal or near-optimal worst-case error -- TreeHist and Bitstogram. In both algorithms, server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $\tilde O(n^{5/2})$ server time and $\tilde O(n^{3/2})$ user time. With a typically large number of participants in local algorithms ($n$ in the millions), this reduction in time complexity, in particular at the user side, is crucial for the use of such algorithms in practice. We implemented Algorithm TreeHist to verify our theoretical analysis and compared its performance with the performance of Google's RAPPOR code.

STOC Conference 2016 Conference Paper

Algorithmic stability for adaptive data analysis

Raef Bassily
Kobbi Nissim
Adam Smith 0006
Thomas Steinke 0002
Uri Stemmer
Jonathan R. Ullman

Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution P and a set of n independent samples x is drawn from P . We seek an algorithm that, given x as input, accurately answers a sequence of adaptively chosen ``queries'' about the unknown distribution P . How many samples n must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: We give upper bounds on the number of samples n that are needed to answer statistical queries . The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries (alternatively, risk minimization queries ). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy . We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.

FOCS Conference 2015 Conference Paper

Differentially Private Release and Learning of Threshold Functions

Mark Bun
Kobbi Nissim
Uri Stemmer
Salil P. Vadhan

We prove new upper and lower bounds on the sample complexity of (ε, δ) differentially private algorithms for releasing approximate answers to threshold functions. A threshold function c over a totally ordered domain X evaluates to c z (y) = 1 if y ≤ x, and evaluates to 0 otherwise. We give the first nontrivial lower bound for releasing thresholds with (ε, δ) differential privacy, showing that the task is impossible over an infinite domain X, and moreover requires sample complexity n ≥ Ω(log* |X|), which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with n ≤ 2 (1+ο(1)) log* |X| samples. This improves the previous best upper bound of 8 (1+ο(1)) log* |X| (Beimel et al. , RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with (ε, δ) differential privacy and learning without privacy. For properly learning thresholds in ℓ dimensions, this lower bound extends to n ≥ Ω(ℓ · log* |X|). To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database D of elements from X, the interior point problem asks for an element between the smallest and largest elements in D. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.

SODA Conference 2015 Conference Paper

Learning Privately with Labeled and Unlabeled Examples

Amos Beimel
Kobbi Nissim
Uri Stemmer

A private learner is an algorithm that given a sample of labeled individual examples outputs a generalizing hypothesis while preserving the privacy of each individual. In 2008, Kasiviswanathan et al. (FOCS 2008) gave a generic construction of private learners, in which the sample complexity is (generally) higher than what is needed for non-private learners. This gap in the sample complexity was then further studied in several followup papers, showing that (at least in some cases) this gap is unavoidable. Moreover, those papers considered ways to overcome the gap, by relaxing either the privacy or the learning guarantees of the learner. We suggest an alternative approach, inspired by the (non-private) models of semi-supervised learning and active-learning, where the focus is on the sample complexity of labeled examples whereas unlabeled examples are of a significantly lower cost. We consider private semi-supervised learners that operate on a random sample, where only a (hopefully small) portion of this sample is labeled. The learners have no control over which of the sample elements are labeled. Our main result is that the labeled sample complexity of private learners is characterized by the VC dimension. We present two generic constructions of private semi-supervised learners. The first construction is of learners where the labeled sample complexity is proportional to the VC dimension of the concept class, however, the unlabeled sample complexity of the algorithm is as big as the representation length of domain elements. Our second construction presents a new technique for decreasing the labeled sample complexity of a given private learner, while roughly maintaining its unlabeled sample complexity. In addition, we show that in some settings the labeled sample complexity does not depend on the privacy parameters of the learner.