Arrow Research search

Author name cluster

Guy N. Rothblum

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers
1 author row

Possible papers

31

ICLR Conference 2025 Conference Paper

How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions

  • Tal Herman
  • Guy N. Rothblum

As statistical analyses become more central to science, industry and society, there is a growing need to ensure correctness of their results. Approximate correctness can be verified by replicating the entire analysis, but can we verify without replication? We focus on distribution testing problems: verifying that an unknown distribution is close to having a claimed property. Our main contribution is an interactive protocol between a verifier and an untrusted prover, which can be used to verify any distribution property that can be decided in polynomial time given a full and explicit description of the distribution. If the distribution is at statistical distance $\varepsilon$ from having the property, then the verifier rejects with high probability. This soundness property holds against any polynomial-time strategy that a cheating prover might follow, assuming the existence of collision-resistant hash functions (a standard assumption in cryptography). For distributions over a domain of size $N$, the protocol consists of $4$ messages and the communication complexity and verifier runtime are roughly $\widetilde{O}\left(\sqrt{N} / \varepsilon^2 \right)$. The verifier's sample complexity is $\widetilde{O}\left(\sqrt{N} / \varepsilon^2 \right)$, and this is optimal up to $\text{polylog}(N)$ factors (for any protocol, regardless of its communication complexity). Even for simple properties, approximately deciding whether an unknown distribution has the property can require quasi-linear sample complexity and running time. For any such property, our protocol provides a quadratic speedup over replicating the analysis.

ICML Conference 2025 Conference Paper

Local Pan-privacy for Federated Analytics

  • Vitaly Feldman
  • Audra McMillan
  • Guy N. Rothblum
  • Kunal Talwar

Pan-privacy was proposed by Dwork et al. (2010) as an approach to designing a private analytics system that retains its privacy properties in the face of intrusions that expose the system’s internal state. Motivated by Federated telemetry applications, we study local pan-privacy, where privacy should be retained under repeated unannounced intrusions on the local state. We consider the problem of monitoring the count of an event in a federated system, where event occurrences on a local device should be hidden even from an intruder on that device. We show that under reasonable constraints, the goal of providing information-theoretic differential privacy under intrusion is incompatible with collecting telemetry information. We then show that this problem can be solved in a scalable way using standard cryptographic primitives.

FOCS Conference 2025 Conference Paper

Proving Natural Distribution Properties is Harder than Testing Them

  • Tal Herman
  • Guy N. Rothblum

Suppose that an untrusted analyst claims that it ran a distribution tester and determined that an unknown distribution has a certain property. Can the untrusted analyst prove that its assertion is correct to a verifier that does not have sufficient samples and computational resources to run the tester on its own? In this work, we are interested in proofs that can be generated very efficiently, with minimal overhead over running the distribution tester. In particular, since the distribution tester is sublinear (in the domain size), at the very least we also want the sample complexity for generating the proof to be sublinear. Do natural properties that have sublinear testers admit such proof systems? Our main result answers this question negatively for several natural properties. For these properties, if the verifier’s sample complexity is non-trivial (smaller than just running the tester on its own), then the (honest) prover must draw a linear number of samples. We show this result for the problem of testing whether the distribution is uniform over its support, for specifying the distribution’s k-collision probability (or its L k norm), and for other natural properties. Our results shed light on a recent line of work showing that if we allow the prover to draw a quasi-linear number of samples, then many distribution properties have proof-systems with very efficient verification. Our negative results imply that the super-linear sample complexity of the prover in those proof-systems is inherent.

FOCS Conference 2024 Conference Paper

Interactive Proofs for General Distribution Properties

  • Tal Herman
  • Guy N. Rothblum

Suppose Alice has collected a small number of samples from an unknown distribution, and would like to learn about the distribution. Bob, an untrusted data analyst, claims that he ran a sophisticated data analysis on the distribution, and makes assertions about its properties. Can Alice efficiently verify Bob's claims using fewer resources (say in terms of samples and computation) than would be needed to run the analysis herself? We construct an interactive proof system for any distribution property for which the distance from the property can be computed by (log-space) uniform polynomial size circuits of depth D, where the circuit gets a complete description of the distribution. Taking N to be an upper bound on the size of the distribution's support, the verifier's sample complexity, the running time, and the communication complexity are all sublinear in N: they are bounded by O(N 1-a +D) for a constant a > 0. The honest prover runs in poly(N) time and has quasi-linear sample complexity. Moreover, the proof system is tolerant: it can be used to approximate the distribution's distance from the property. We show similar results for any distribution property for which the distance from the property can be approximated by a bounded-space Turing machine (that gets as input a complete description of the distribution). We remark that even for simple properties, deciding the property without a prover requires quasi-linear sample complexity and running time. Prior work [Herman and Rothblum, FOCS 2023] demonstrated sublinear interactive proof systems, but only for the much more restricted class of label-invariant distribution properties.

STOC Conference 2023 Conference Paper

Constant-Round Arguments from One-Way Functions

  • Noga Amit
  • Guy N. Rothblum

We study the following question: what cryptographic assumptions are needed for obtaining constant-round computationally-sound argument systems? We focus on argument systems with almost-linear verification time for subclasses of P , such as depth-bounded computations. Kilian’s celebrated work [STOC 1992] provides such 4-message arguments for P (actually, for NP ) using collision-resistant hash functions. We show that one - way functions suffice for obtaining constant-round arguments of almost-linear verification time for languages in P that have log-space uniform circuits of linear depth and polynomial size. More generally, the complexity of the verifier scales with the circuit depth. Furthermore, our argument systems (like Kilian’s) are doubly-efficient; that is, the honest prover strategy can be implemented in polynomial-time. Unconditionally sound interactive proofs for this class of computations do not rely on any cryptographic assumptions, but they require a linear number of rounds [Goldwasser, Kalai and Rothblum, STOC 2008]. Constant-round interactive proof systems of linear verification complexity are not known even for NC (indeed, even for AC 1 ).

FOCS Conference 2023 Conference Paper

Doubley-Efficient Interactive Proofs for Distribution Properties

  • Tal Herman
  • Guy N. Rothblum

Suppose we have access to a small number of samples from an unknown distribution, and would like learn facts about the distribution. An untrusted data server claims to have studied the distribution and makes assertions about its properties. Can the untrusted data server prove that its assertions are approximately correct? Can a short efficiently verifiable proof be generated in polynomial time? We study doubly-efficient interactive proof systems that can be used to verify properties of an unknown distribution over a domain $[N]$. On top of efficient verification, our focus is on proofs that the honest prover can generate in polynomial time. More generally, the complexity of generating the proof should be as close as possible to the complexity of simply running a standalone analysis to determine whether the distribution has the property. Our main result is a new 2-message doubly-efficient interactive proof protocol for verifying any label-invariant distribution property (any property that is invariant to relabeling of the elements in the domain of the distribution). The sample complexity, communication complexity and verifier runtime are all $\widetilde{O}(\sqrt{N})$. The proof can be generated in quasi-linear $\widetilde{O}(N)$ time and sample complexities (the runtimes of the verifier and the honest prover hold under a mild assumption about the property’s computational complexity). This improves on prior work, where constructing the proof required super-polynomial time [Herman and Rothblum, STOC 2022]. Our new proof system is directly applicable to proving (and verifying) several natural and widely-studied properties, such as a distribution’s support size, its Shannon entropy, and its distance from the uniform distribution. For these (and many other) properties, the runtime and sample complexities for generating the proof are within polylog $(N)$ factors of the complexities for simply determining whether the property holds.

ICML Conference 2021 Conference Paper

Multi-group Agnostic PAC Learnability

  • Guy N. Rothblum
  • Gal Yona

An agnostic PAC learning algorithm finds a predictor that is competitive with the best predictor in a benchmark hypothesis class, where competitiveness is measured with respect to a given loss function. However, its predictions might be quite sub-optimal for structured subgroups of individuals, such as protected demographic groups. Motivated by such fairness concerns, we study “multi-group agnostic PAC learnability”: fixing a measure of loss, a benchmark class $\H$ and a (potentially) rich collection of subgroups $\G$, the objective is to learn a single predictor such that the loss experienced by every group $g \in \G$ is not much larger than the best possible loss for this group within $\H$. Under natural conditions, we provide a characterization of the loss functions for which such a predictor is guaranteed to exist. For any such loss function we construct a learning algorithm whose sample complexity is logarithmic in the size of the collection $\G$. Our results unify and extend previous positive and negative results from the multi-group fairness literature, which applied for specific loss functions.

SODA Conference 2019 Conference Paper

Fine-grained Complexity Meets IP = PSPACE

  • Lijie Chen 0001
  • Shafi Goldwasser
  • Kaifeng Lyu
  • Guy N. Rothblum
  • Aviad Rubinstein

In this paper we study the fine-grained complexity of finding exact and approximate solutions to problems in P. Our main contribution is showing reductions from an exact to an approximate solution for a host of such problems. As one (notable) example, we show that the Closest-LCS-Pair problem (Given two sets of strings A and B, compute exactly the maximum LCS( a, b ) with ( a, b ) ∊ A × B ) is equivalent to its approximation version (under near-linear time reductions, and with a constant approximation factor). More generally, we identify a class of problems, which we call BP-Pair-Class, comprising both exact and approximate solutions, and show that they are all equivalent under near-linear time reductions. Exploring this class and its properties, we also show: Under the NC-SETH assumption (a significantly more relaxed assumption than SETH), solving any of the problems in this class requires essentially quadratic time. Modest improvements on the running time of known algorithms (shaving log factors) would imply that NEXP is not in non-uniform NC 1. Finally, we leverage our techniques to show new barriers for deterministic approximation algorithms for LCS. A very important consequence of our results is that they continue to hold in the data structure setting. In particular, it shows that a data structure for approximate Nearest Neighbor Search for LCS (NNS LCS ) implies a data structure for exact NNS LCS and a data structure for answering regular expression queries with essentially the same complexity. At the heart of these new results is a deep connection between interactive proof systems for bounded-space computations and the fine-grained complexity of exact and approximate solutions to problems in P. In particular, our results build on the proof techniques from the classical IP = PSPACE result.

FOCS Conference 2019 Conference Paper

Learning from Outcomes: Evidence-Based Rankings

  • Cynthia Dwork
  • Michael P. Kim
  • Omer Reingold
  • Guy N. Rothblum
  • Gal Yona

Many selection procedures involve ordering candidates according to their qualifications. For example, a university might order applicants according to a perceived probability of graduation within four years, and then select the top 1000 applicants. In this work, we address the problem of ranking members of a population according to their "probability" of success, based on a training set of historical binary outcome data (e. g. , graduated in four years or not). We show how to obtain rankings that satisfy a number of desirable accuracy and fairness criteria, despite the coarseness of the training data. As the task of ranking is global (the rank of every individual depends not only on their own qualifications, but also on every other individuals' qualifications) ranking is more subtle and vulnerable to manipulation than standard prediction tasks. Towards mitigating unfair discrimination caused by inaccuracies in rankings, we develop two parallel definitions of evidence-based rankings. The first definition relies on a semantic notion of domination-compatibility: if the training data suggest that members of a set S are more qualified (on average) than the members of T, then a ranking that favors T over S (i. e. where T dominates S) is blatantly inconsistent with the evidence, and likely to be discriminatory. The definition asks for domination-compatibility, not just for a pair of sets, but rather for every pair of sets from a rich collection C of subpopulations. The second definition aims at precluding even more general forms of discrimination; this notion of evidence-consistency requires that the ranking must be justified on the basis of consistency with the expectations for every set in the collection C. Somewhat surprisingly, while evidence-consistency is a strictly stronger notion than domination-compatibility when the collection C is predefined, the two notions are equivalent when the collection C may depend on the ranking in question.

STOC Conference 2018 Conference Paper

Composable and versatile privacy via truncated CDP

  • Mark Bun
  • Cynthia Dwork
  • Guy N. Rothblum
  • Thomas Steinke 0002

We propose truncated concentrated differential privacy (tCDP), a refinement of differential privacy and of concentrated differential privacy. This new definition provides robust and efficient composition guarantees, supports powerful algorithmic techniques such as privacy amplification via sub-sampling, and enables more accurate statistical analyses. In particular, we show a central task for which the new definition enables exponential accuracy improvement.

FOCS Conference 2018 Conference Paper

Counting t-Cliques: Worst-Case to Average-Case Reductions and Direct Interactive Proof Systems

  • Oded Goldreich 0001
  • Guy N. Rothblum

We study two aspects of the complexity of counting the number of t-cliques in a graph: 1) Worst-case to average-case reductions: Our main result reduces counting t-cliques in any n-vertex graph to counting t-cliques in typical n-vertex graphs that are drawn from a simple distribution of min-entropy Ω(n2). For any constant t, the reduction runs in O(n2)-time, and yields a correct answer (w. h. p.) even when the “average-case solver” only succeeds with probability 1/poly(log n). 2) Direct interactive proof systems: We present a direct and simple interactive proof system for counting t-cliques in n-vertex graphs. The proof system uses t - 2 rounds, the verifier runs in O(t2n2)-time, and the prover can be implemented in O(tO(1) · n2)-time when given oracle access to counting (t - 1)-cliques in O(tO(1) · n)-vertex graphs. The results are both obtained by considering weighted versions of the t-clique problem, where weights are assigned to vertices and/or to edges, and the weight of cliques is defined as the product of the corresponding weights. These weighted problems are shown to be easily reducible to the unweighted problem.

ICML Conference 2018 Conference Paper

Multicalibration: Calibration for the (Computationally-Identifiable) Masses

  • Úrsula Hébert-Johnson
  • Michael P. Kim
  • Omer Reingold
  • Guy N. Rothblum

We develop and study multicalibration as a new measure of fairness in machine learning that aims to mitigate inadvertent or malicious discrimination that is introduced at training time (even from ground truth data). Multicalibration guarantees meaningful (calibrated) predictions for every subpopulation that can be identified within a specified class of computations. The specified class can be quite rich; in particular, it can contain many overlapping subgroups of a protected group. We demonstrate that in many settings this strong notion of protection from discrimination is provably attainable and aligned with the goal of obtaining accurate predictions. Along the way, we present algorithms for learning a multicalibrated predictor, study the computational complexity of this task, and illustrate tight connections to the agnostic learning model.

ICML Conference 2018 Conference Paper

Probably Approximately Metric-Fair Learning

  • Gal Yona
  • Guy N. Rothblum

The seminal work of Dwork et al. [ITCS 2012] introduced a metric-based notion of individual fairness: given a task-specific similarity metric, their notion required that every pair of similar individuals should be treated similarly. In the context of machine learning, however, individual fairness does not generalize from a training set to the underlying population. We show that this can lead to computational intractability even for simple fair-learning tasks. With this motivation in mind, we introduce and study a relaxed notion of approximate metric-fairness: for a random pair of individuals sampled from the population, with all but a small probability of error, if they are similar then they should be treated similarly. We formalize the goal of achieving approximate metric-fairness simultaneously with best-possible accuracy as Probably Approximately Correct and Fair (PACF) Learning. We show that approximate metric-fairness does generalize, and leverage these generalization guarantees to construct polynomial-time PACF learning algorithms for the classes of linear and logistic predictors.

FOCS Conference 2012 Conference Paper

How to Compute in the Presence of Leakage

  • Shafi Goldwasser
  • Guy N. Rothblum

We address the following problem: how to execute any algorithm P, for an unbounded number of executions, in the presence of an adversary who observes partial information on the internal state of the computation during executions. The security guarantee is that the adversary learns nothing, beyond P's input/output behavior. This general problem is important for running cryptographic algorithms in the presence of side-channel attacks, as well as for running non-cryptographic algorithms, such as a proprietary search algorithm or a game, on a cloud server where parts of the execution's internals might be observed. Our main result is a compiler, which takes as input an algorithm P and a security parameter κ, and produces a functionally equivalent algorithm P'. The running time of P' is a factor of poly(κ) slower than P. P' will be composed of a series of calls to poly(κ)-time computable sub-algorithms. During the executions of P', an adversary algorithm A, which can choose the inputs of P', can learn the results of adaptively chosen leakage functions - each of bounded output size Ω̃(κ) - on the sub-algorithms of P' and the randomness they use. We prove that any computationally unbounded A observing the results of computationally unbounded leakage functions, will learn no more from its observations than it could given blackbox access only to the input-output behavior of P. This result is unconditional and does not rely on any secure hardware components.

FOCS Conference 2010 Conference Paper

A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis

  • Moritz Hardt
  • Guy N. Rothblum

We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacy-preserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worst-case accuracy guarantees that can answer large numbers of interactive queries and is efficient (in terms of the runtime's dependence on the data universe size). The error is asymptotically optimal in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly linear in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for any input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a smooth distribution - a distribution that does not place too much weight on any single data item - accuracy remains as above, and the running time becomes poly-logarithmic in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions.

FOCS Conference 2010 Conference Paper

Boosting and Differential Privacy

  • Cynthia Dwork
  • Guy N. Rothblum
  • Salil P. Vadhan

Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved privacy-pre serving synopses of an input database. These are data structures that yield, for a given set Q of queries over an input database, reasonably accurate estimates of the responses to every query in Q, even when the number of queries is much larger than the number of rows in the database. Given a base synopsis generator that takes a distribution on Q and produces a "weak" synopsis that yields "good" answers for a majority of the weight in Q, our Boosting for Queries algorithm obtains a synopsis that is good for all of Q. We ensure privacy for the rows of the database, but the boosting is performed on the queries. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, i. e. , queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an O(ε 2 ) bound on the expected privacy loss from a single e-differentially private mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides e-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.

STOC Conference 2008 Conference Paper

Delegating computation: interactive proofs for muggles

  • Shafi Goldwasser
  • Yael Tauman Kalai
  • Guy N. Rothblum

In this work we study interactive proofs for tractable languages. The (honest) prover should be efficient and run in polynomial time, or in other words a "muggle". The verifier should be super-efficient and run in nearly-linear time. These proof systems can be used for delegating computation: a server can run a computation for a client and interactively prove the correctness of the result. The client can verify the result's correctness in nearly-linear time (instead of running the entire computation itself). Previously, related questions were considered in the Holographic Proof setting by Babai, Fortnow, Levin and Szegedy, in the argument setting under computational assumptions by Kilian, and in the random oracle model by Micali. Our focus, however, is on the original interactive proof model where no assumptions are made on the computational power or adaptiveness of dishonest provers. Our main technical theorem gives a public coin interactive proof for any language computable by a log-space uniform boolean circuit with depth d and input length n. The verifier runs in time (n+d) • polylog(n) and space O(log(n)), the communication complexity is d • polylog(n), and the prover runs in time poly(n). In particular, for languages computable by log-space uniform NC (circuits of polylog(n) depth), the prover is efficient, the verifier runs in time n • polylog(n) and space O(log(n)), and the communication complexity is polylog(n). Using this theorem we make progress on several questions: We show how to construct short (polylog size) computationally sound non-interactive certificates of correctness for any log-space uniform NC computation, in the public-key model. The certificates can be verified in quasi-linear time and are for a designated verifier: each certificate is tailored to the verifier's public key. This result uses a recent transformation of Kalai and Raz from public-coin interactive proofs to one-round arguments . The soundness of the certificates is based on the existence of a PIR scheme with polylog communication. Interactive proofs with public-coin , log-space, poly-time verifiers for all of P. This settles an open question regarding the expressive power of proof systems with such verifiers. Zero-knowledge interactive proofs with communication complexity that is quasi-linear in the witness, length for any NP language verifiable in NC, based on the existence of one-way functions. Probabilistically checkable arguments (a model due to Kalai and Raz) of size polynomial in the witness length (rather than the instance length) for any NP language verifiable in NC, under computational assumptions.

STOC Conference 2007 Conference Paper

Verifying and decoding in constant depth

  • Shafi Goldwasser
  • Dan Gutfreund
  • Alexander Healy
  • Tali Kaufman
  • Guy N. Rothblum

We develop a general approach for improving the efficiency of a computationally bounded receiver interacting with a powerful and possibly malicious sender. The key idea we use is that of delegating some of the receiver's computation to the (potentially malicious) sender. This idea was recently introduced by Goldwasser et al. [14] in the area of program checking. A classic example of such a sender-receiver setting is interactive proof systems. By taking the sender to be a (potentially malicious) prover and the receiver to be a verifier, we show that ( p -prover) interactive proofs with k rounds of interaction are equivalent to ( p -prover) interactive proofs with k+O(1) rounds, where the verifier is in NC 0 . That is, each round of the verifier's computation can be implemented in constant parallel time. As a corollary, we obtain interactive proof systems, with (optimally) constant soundness, for languages in AM and NEXP, where the verifier runs in constant parallel-time. Another, less immediate sender-receiver setting arises in considering error correcting codes. By taking the sender to be a (potentially corrupted) codeword and the receiver to be a decoder, we obtain explicit families of codes that are locally (list-)decodable by constant-depth circuits of size polylogarithmic in the length of the codeword. Using the tight connection between locally list-decodable codes and average-case complexity, we obtain a new, more efficient, worst-case to average-case reduction for languages in EXP.

FOCS Conference 2005 Conference Paper

The Complexity of Online Memory Checking

  • Moni Naor
  • Guy N. Rothblum

We consider the problem of storing a large file on a remote and unreliable server. To verify that the file has not been corrupted, a user could store a small private (randomized) "fingerprint" on his own computer. This is the setting for the well-studied authentication problem in cryptography, and the required fingerprint size is well understood. We study the problem of sub-linear authentication: suppose the user would like to encode and store the file in a way that allows him to verify that it has not been corrupted, but without reading the entire file. If the user only wants to read t bits of the file, how large does the size s of the private fingerprint need to be? We define this problem formally, and show a tight lower bound on the relationship between s and t when the adversary is not computationally bounded, namely: s /spl times/ t = /spl Omega/(n), where n is the file size. This is an easier case of the online memory checking problem, introduced by Blum et al. in 1991, and hence the same (tight) lower bound applies also to that problem. It was previously shown that when the adversary is computationally bounded, under the assumption that one-way functions exist, it is possible to construct much better online memory checkers and sub-linear authentication schemes. We show that the existence of one-way functions is also a necessary condition: even slightly breaking the s /spl times/ t = /spl Omega/(n) lower bound in a computational setting implies the existence of one-way functions.