Arrow Research search

Author name cluster

Mark Bun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

NeurIPS Conference 2024 Conference Paper

Optimal Hypothesis Selection in (Almost) Linear Time

  • Maryam Aliakbarpour
  • Mark Bun
  • Adam Smith

Hypothesis selection, also known as density estimation, is a fundamental problem in statistics and learning theory. Suppose we are given a sample set from an unknown distribution $P$ and a finite class of candidate distributions (called hypotheses) $\mathcal{H} \coloneqq \{H_1, H_2, \ldots, H_n\}$. The aim is to design an algorithm that selects a distribution $\hat H$ in $\mathcal{H}$ that best fits the data. The algorithm's accuracy is measured based on the distance between $\hat{H}$ and $P$ compared to the distance of the closest distribution in $\mathcal{H}$ to $P$ (denoted by $OPT$). Concretely, we aim for $\|\hat{H} - P\|_{TV}$ to be at most $ \alpha \cdot OPT + \epsilon$ for some small $\epsilon$ and $\alpha$. While it is possible to decrease the value of $\epsilon$ as the number of samples increases, $\alpha$ is an inherent characteristic of the algorithm. In fact, one cannot hope to achieve $\alpha < 3$ even when there are only two candidate hypotheses, unless the number of samples is proportional to the domain size of $P$ [Bousquet, Kane, Moran '19]. Finding the best $\alpha$ has been one of the main focuses of studies of the problem since early work of [Devroye, Lugosi '01]. Prior to our work, no algorithm was known that achieves $\alpha = 3$ in near-linear time. We provide the first algorithm that operates in almost linear time ($\tilde{O}(n/\epsilon^3)$ time) and achieves $\alpha = 3$. This result improves upon a long list of results in hypothesis selection. Previously known algorithms either had worse time complexity, a larger factor $\alpha$, or extra assumptions about the problem setting. In addition to this algorithm, we provide another (almost) linear-time algorithm with better dependency on the additive accuracy parameter $\epsilon$, albeit with a slightly worse accuracy parameter, $\alpha = 4$.

NeurIPS Conference 2024 Conference Paper

Oracle-Efficient Differentially Private Learning with Public Data

  • Adam Block
  • Mark Bun
  • Rathin Desai
  • Abhishek Shetty
  • Zhiwei S. Wu

Due to statistical lower bounds on the learnability of many function classes under privacy constraints, there has been recent interest in leveraging public data to improve the performance of private learning algorithms. In this model, algorithms must always guarantee differential privacy with respect to the private samples while also ensuring learning guarantees when the private data distribution is sufficiently close to that of the public data. Previous work has demonstrated that when sufficient public, unlabelled data is available, private learning can be made statistically tractable, but the resulting algorithms have all been computationally inefficient. In this work, we present the first computationally efficient, algorithms to provably leverage public data to learn privately whenever a function class is learnable non-privately, where our notion of computational efficiency is with respect to the number of calls to an optimization oracle for the function class. In addition to this general result, we provide specialized algorithms with improved sample complexities in the special cases when the function class is convex or when the task is binary classification.

NeurIPS Conference 2023 Conference Paper

Hypothesis Selection with Memory Constraints

  • Maryam Aliakbarpour
  • Mark Bun
  • Adam Smith

Hypothesis selection is a fundamental problem in learning theory and statistics. Given a dataset and a finite set of candidate distributions, the goal is to select a distribution that matches the data as well as possible. More specifically, suppose we have sample access to an unknown distribution $P$ over a domain $\mathcal{X}$ that we know is well-approximated by one of a a class of $n$ distributions (a. k. a. hypotheses), $\mathcal{H} \coloneqq \{H_1, H_2, \ldots, H_n\}$. The goal is to design an algorithm that outputs a distribution $\hat{H} \in \mathcal{H}$ whose total variation distance from $P$ is nearly minimal. In this work, we study the hypothesis selection problem under memory constraints. We consider a model where samples from $P$ are presented in a stream and we access each sample $x$ via ``PDF-comparison'' queries that allow us to compare the probability densities of any pair of hypothesesat the domain point $x$ (i. e. , is $H_i(x) < H_j(x)$? ). This model allows us to study how much memory is needed at any point in time to store information about the portion of the stream seen so far. Our main result is an algorithm that achieves a nearly optimal tradeoff between memory usage and the number of samples required. In particular, given $b$ bits of memory (for $b$ roughly between $\log n$ and $n$), our algorithm solves the hypothesis selection problem with $s$ samples, where $b \cdot s = O(n \log n)$. This result is optimal up to an $O(\log n)$ factor, for all $b$.

STOC Conference 2023 Conference Paper

Stability Is Stable: Connections between Replicability, Privacy, and Adaptive Generalization

  • Mark Bun
  • Marco Gaboardi
  • Max Hopkins
  • Russell Impagliazzo
  • Rex Lei
  • Toniann Pitassi
  • Satchit Sivakumar
  • Jessica Sorrell

The notion of replicable algorithms was introduced by Impagliazzo, Lei, Pitassi, and Sorrell (STOC’22) to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis can facilitate the verification of published results by ensuring that the results of an analysis will be the same with high probability, even when that analysis is performed on a new data set. In this work, we establish new connections and separations between replicability and standard notions of algorithmic stability. In particular, we give sample-efficient algorithmic reductions between perfect generalization, approximate differential privacy, and replicability for a broad class of statistical problems. Conversely, we show any such equivalence must break down computationally: there exist statistical problems that are easy under differential privacy, but that cannot be solved replicably without breaking public-key cryptography. Furthermore, these results are tight: our reductions are statistically optimal, and we show that any computational separation between DP and replicability must imply the existence of one-way functions. Our statistical reductions give a new algorithmic framework for translating between notions of stability, which we instantiate to answer several open questions in replicability and privacy. This includes giving sample-efficient replicable algorithms for various PAC learning, distribution estimation, and distribution testing problems, algorithmic amplification of δ in approximate DP, conversions from item-level to user-level privacy, and the existence of private agnostic-to-realizable learning reductions under structured distributions.

ICML Conference 2021 Conference Paper

Differentially Private Correlation Clustering

  • Mark Bun
  • Marek Eliás 0001
  • Janardhan Kulkarni

Correlation clustering is a widely used technique in unsupervised machine learning. Motivated by applications where individual privacy is a concern, we initiate the study of differentially private correlation clustering. We propose an algorithm that achieves subquadratic additive error compared to the optimal cost. In contrast, straightforward adaptations of existing non-private algorithms all lead to a trivial quadratic error. Finally, we give a lower bound showing that any pure differentially private algorithm for correlation clustering requires additive error $\Omega$(n).

NeurIPS Conference 2021 Conference Paper

Multiclass versus Binary Differentially Private PAC Learning

  • Satchit Sivakumar
  • Mark Bun
  • Marco Gaboardi

We show a generic reduction from multiclass differentially private PAC learning to binary private PAC learning. We apply this transformation to a recently proposed binary private PAC learner to obtain a private multiclass learner with sample complexity that has a polynomial dependence on the multiclass Littlestone dimension and a poly-logarithmic dependence on the number of classes. This yields a doubly exponential improvement in the dependence on both parameters over learners from previous work. Our proof extends the notion of $\Psi$-dimension defined in work of Ben-David et al. [JCSS, 1995] to the online setting and explores its general properties.

NeurIPS Conference 2020 Conference Paper

A Computational Separation between Private Learning and Online Learning

  • Mark Bun

A recent line of work has shown a qualitative equivalence between differentially private PAC learning and online learning: A concept class is privately learnable if and only if it is online learnable with a finite mistake bound. However, both directions of this equivalence incur significant losses in both sample and computational efficiency. Studying a special case of this connection, Gonen, Hazan, and Moran (NeurIPS 2019) showed that uniform or highly sample-efficient pure-private learners can be time-efficiently compiled into online learners. We show that, assuming the existence of one-way functions, such an efficient conversion is impossible even for general pure-private learners with polynomial sample complexity. This resolves a question of Neel, Roth, and Wu (FOCS 2019).

FOCS Conference 2020 Conference Paper

An Equivalence Between Private Classification and Online Prediction

  • Mark Bun
  • Roi Livni
  • Shay Moran

We prove that every concept class with finite Littlestone dimension can be learned by an (approximate) differentially-private algorithm. This answers an open question of Alon et al. (STOC 2019) who proved the converse statement (this question was also asked by Neel et al. (FOCS 2019)). Together these two results yield an equivalence between online learnability and private PAC learnability. We introduce a new notion of algorithmic stability called “global stability” which is essential to our proof and may be of independent interest. We also discuss an application of our results to boosting the privacy and accuracy parameters of differentially-private learners.

ICML Conference 2020 Conference Paper

New Oracle-Efficient Algorithms for Private Synthetic Data Release

  • Giuseppe Vietri
  • Grace Tian
  • Mark Bun
  • Thomas Steinke 0002
  • Zhiwei Steven Wu

We present three new algorithms for constructing differentially private synthetic data—a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle’s optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism (McKenna et al. VLDB 2018), our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss $\eps$).

NeurIPS Conference 2019 Conference Paper

Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation

  • Mark Bun
  • Thomas Steinke

The simplest and most widely applied method for guaranteeing differential privacy is to add instance-independent noise to a statistic of interest that is scaled to its global sensitivity. However, global sensitivity is a worst-case notion that is often too conservative for realized dataset instances. We provide methods for scaling noise in an instance-dependent way and demonstrate that they provide greater accuracy under average-case distributional assumptions. Specifically, we consider the basic problem of privately estimating the mean of a real distribution from i. i. d. samples. The standard empirical mean estimator can have arbitrarily-high global sensitivity. We propose the trimmed mean estimator, which interpolates between the mean and the median, as a way of attaining much lower sensitivity on average while losing very little in terms of statistical accuracy. To privately estimate the trimmed mean, we revisit the smooth sensitivity framework of Nissim, Raskhodnikova, and Smith (STOC 2007), which provides a framework for using instance-dependent sensitivity. We propose three new additive noise distributions which provide concentrated differential privacy when scaled to smooth sensitivity. We provide theoretical and experimental evidence showing that our noise distributions compare favorably to others in the literature, in particular, when applied to the mean estimation problem.

NeurIPS Conference 2019 Conference Paper

Private Hypothesis Selection

  • Mark Bun
  • Gautam Kamath
  • Thomas Steinke
  • Steven Wu

We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we denote by $\alpha$). The sample complexity of our basic algorithm is $O\left(\frac{\log m}{\alpha^2} + \frac{\log m}{\alpha \varepsilon}\right)$, representing a minimal cost for privacy when compared to the non-private algorithm. We also can handle infinite hypothesis classes $\mathcal{H}$ by relaxing to $(\varepsilon, \delta)$-differential privacy. We apply our hypothesis selection algorithm to give learning algorithms for a number of natural distribution classes, including Gaussians, product distributions, sums of independent random variables, piecewise polynomials, and mixture classes. Our hypothesis selection procedure allows us to generically convert a cover for a class to a learning algorithm, complementing known learning lower bounds which are in terms of the size of the packing number of the class. As the covering and packing numbers are often closely related, for constant $\alpha$, our algorithms achieve the optimal sample complexity for many classes of interest. Finally, we describe an application to private distribution-free PAC learning.

SODA Conference 2019 Conference Paper

Quantum algorithms and approximating polynomials for composed functions with shared inputs

  • Mark Bun
  • Robin Kothari
  • Justin Thaler

We give new quantum algorithms for evaluating composed functions whose inputs may be shared between bottom-level gates. Let f be a Boolean function and consider a function F obtained by applying f to conjunctions of possibly overlapping subsets of n variables. If f has quantum query complexity Q ( f ), we give an algorithm for evaluating F using quantum queries. This improves on the bound of that follows by treating each conjunction independently, and is tight for worst-case choices of f. Using completely different techniques, we prove a similar tight composition theorem for the approximate degree of f. By recursively applying our composition theorems, we obtain a nearly optimal Õ ( n 1–2– d ) upper bound on the quantum query complexity and approximate degree of linear-size depth- d AC 0 circuits. As a consequence, such circuits can be PAC learned in subexponential time, even in the challenging agnostic setting. Prior to our work, a subexponential-time algorithm was not known even for linear-size depth-3 AC 0 circuits. We also show that any substantially faster learning algorithm will require fundamentally new techniques.

JMLR Journal 2019 Journal Article

Simultaneous Private Learning of Multiple Concepts

  • Mark Bun
  • Kobbi Nissim
  • Uri Stemmer

We investigate the direct-sum problem in the context of differentially private PAC learning: What is the sample complexity of solving $k$ learning tasks simultaneously under differential privacy, and how does this cost compare to that of solving $k$ learning tasks without privacy? In our setting, an individual example consists of a domain element $x$ labeled by $k$ unknown concepts $(c_1,\ldots,c_k)$. The goal of a multi-learner is to output $k$ hypotheses $(h_1,\ldots,h_k)$ that generalize the input examples. Without concern for privacy, the sample complexity needed to simultaneously learn $k$ concepts is essentially the same as needed for learning a single concept. Under differential privacy, the basic strategy of learning each hypothesis independently yields sample complexity that grows polynomially with $k$. For some concept classes, we give multi-learners that require fewer samples than the basic strategy. Unfortunately, however, we also give lower bounds showing that even for very simple concept classes, the sample cost of private multi-learning must grow polynomially in $k$. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

STOC Conference 2018 Conference Paper

Composable and versatile privacy via truncated CDP

  • Mark Bun
  • Cynthia Dwork
  • Guy N. Rothblum
  • Thomas Steinke 0002

We propose truncated concentrated differential privacy (tCDP), a refinement of differential privacy and of concentrated differential privacy. This new definition provides robust and efficient composition guarantees, supports powerful algorithmic techniques such as privacy amplification via sub-sampling, and enables more accurate statistical analyses. In particular, we show a central task for which the new definition enables exponential accuracy improvement.

FOCS Conference 2017 Conference Paper

A Nearly Optimal Lower Bound on the Approximate Degree of AC 0

  • Mark Bun
  • Justin Thaler

The approximate degree of a Boolean function f: {-1, 1} n → {-1, 1} is the least degree of a real polynomial that approximates f pointwise to error at most 1/3. We introduce a generic method for increasing the approximate degree of a given function, while preserving its computability by constant-depth circuits. Specifically, we show how to transform any Boolean function f with approximate degree d into a function F on O(n · polylog(n)) variables with approximate degree at least D = Ω(n 1/3 · d 2/3 ). In particular, if d = n 1-Ω(1), then D is polynomially larger than d. Moreover, if f is computed by a constant-depth polynomial-size Boolean circuit, then so is F. By recursively applying our transformation, for any constant δ > 0 we exhibit an AC 0 function of approximate degree Ω(n 1-δ ). This improves over the best previous lower bound of Ω(n 2/3 ) due to Aaronson and Shi (J. ACM 2004), and nearly matches the trivial upper bound of n that holds for any function. Our lower bounds also apply to (quasipolynomial-size) DNFs of polylogarithmic width. We describe several applications of these results. We give: · For any constant δ > 0, an Ω(n 1-δ ) lower bound on the quantum communication complexity of a function in AC 0. · A Boolean function f with approximate degree at least C(f) 2-o(1), where C(f) is the certificate complexity of f. This separation is optimal up to the o(1) term in the exponent. · Improved secret sharing schemes with reconstruction procedures in AC 0.

ICML Conference 2017 Conference Paper

Differentially Private Submodular Maximization: Data Summarization in Disguise

  • Marko Mitrovic
  • Mark Bun
  • Andreas Krause 0001
  • Amin Karbasi

Many data summarization applications are captured by the general framework of submodular maximization. As a consequence, a wide range of efficient approximation algorithms have been developed. However, when such applications involve sensitive data about individuals, their privacy concerns are not automatically addressed. To remedy this problem, we propose a general and systematic study of differentially private submodular maximization. We present privacy-preserving algorithms for both monotone and non-monotone submodular maximization under cardinality, matroid, and p-extendible system constraints, with guarantees that are competitive with optimal. Along the way, we analyze a new algorithm for non-monotone submodular maximization, which is the first (even non-privately) to achieve a constant approximation ratio while running in linear time. We additionally provide two concrete experiments to validate the efficacy of these algorithms.

FOCS Conference 2015 Conference Paper

Differentially Private Release and Learning of Threshold Functions

  • Mark Bun
  • Kobbi Nissim
  • Uri Stemmer
  • Salil P. Vadhan

We prove new upper and lower bounds on the sample complexity of (ε, δ) differentially private algorithms for releasing approximate answers to threshold functions. A threshold function c over a totally ordered domain X evaluates to c z (y) = 1 if y ≤ x, and evaluates to 0 otherwise. We give the first nontrivial lower bound for releasing thresholds with (ε, δ) differential privacy, showing that the task is impossible over an infinite domain X, and moreover requires sample complexity n ≥ Ω(log* |X|), which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with n ≤ 2 (1+ο(1)) log* |X| samples. This improves the previous best upper bound of 8 (1+ο(1)) log* |X| (Beimel et al. , RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with (ε, δ) differential privacy and learning without privacy. For properly learning thresholds in ℓ dimensions, this lower bound extends to n ≥ Ω(ℓ · log* |X|). To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database D of elements from X, the interior point problem asks for an element between the smallest and largest elements in D. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.

STOC Conference 2014 Conference Paper

Fingerprinting codes and the price of approximate differential privacy

  • Mark Bun
  • Jonathan R. Ullman
  • Salil P. Vadhan

We show new lower bounds on the sample complexity of ( ε, δ )-differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database D ∈ ({0, 1} d ) n has the form "What fraction of the individual records in the database satisfy the property q ?" We show that in order to answer an arbitrary set Q of » nd counting queries on D to within error ± α it is necessary that [EQUATION] This bound is optimal up to poly-logarithmic factors, as demonstrated by the Private Multiplicative Weights algorithm (Hardt and Rothblum, FOCS'10). It is also the first to show that the sample complexity required for ( ε, δ )-differential privacy is asymptotically larger than what is required merely for accuracy, which is O (log | Q |/ α 2 ). In addition, we show that our lower bound holds for the specific case of k -way marginal queries (where | Q | = 2 k ( d/k )) when α is a constant. Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO'95; Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample complexity lower bounds into stronger lower bounds.