Arrow Research search

Author name cluster

Frank McSherry

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

NeurIPS Conference 2012 Conference Paper

A Simple and Practical Algorithm for Differentially Private Data Release

  • Moritz Hardt
  • Katrina Ligett
  • Frank Mcsherry

We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques.

NeurIPS Conference 2010 Conference Paper

Probabilistic Inference and Differential Privacy

  • Oliver Williams
  • Frank Mcsherry

We identify and investigate a strong connection between probabilistic inference and differential privacy, the latter being a recent privacy definition that permits only indirect observation of data through noisy measurement. Previous research on differential privacy has focused on designing measurement processes whose output is likely to be useful on its own. We consider the potential of applying probabilistic inference to the measurements and measurement process to derive posterior distributions over the data sets and model parameters thereof. We find that probabilistic inference can improve accuracy, integrate multiple observations, measure uncertainty, and even provide posterior distributions over quantities that were not directly measured.

FOCS Conference 2007 Conference Paper

Mechanism Design via Differential Privacy

  • Frank McSherry
  • Kunal Talwar

We study the role that privacy-preserving algorithms, which prevent the leakage of specific information about participants, can play in the design of mechanisms for strategic agents, which must encourage players to honestly report information. Specifically, we show that the recent notion of differential privacv, in addition to its own intrinsic virtue, can ensure that participants have limited effect on the outcome of the mechanism, and as a consequence have limited incentive to lie. More precisely, mechanisms with differential privacy are approximate dominant strategy under arbitrary player utility functions, are automatically resilient to coalitions, and easily allow repeatability. We study several special cases of the unlimited supply auction problem, providing new results for digital goods auctions, attribute auctions, and auctions with arbitrary structural constraints on the prices. As an important prelude to developing a privacy-preserving auction mechanism, we introduce and study a generalization of previous privacy work that accommodates the high sensitivity of the auction setting, where a single participant may dramatically alter the optimal fixed price, and a slight change in the offered price may take the revenue from optimal to zero.

STOC Conference 2007 Conference Paper

The price of privacy and the limits of LP decoding

  • Cynthia Dwork
  • Frank McSherry
  • Kunal Talwar

This work is at theintersection of two lines of research. One line, initiated by Dinurand Nissim, investigates the price, in accuracy, of protecting privacy in a statistical database. The second, growing from an extensive literature on compressed sensing (see in particular the work of Donoho and collaborators [4,7,13,11])and explicitly connected to error-correcting codes by Candès and Tao ([4]; see also [5,3]), is in the use of linearprogramming for error correction. Our principal result is the discovery of a sharp threshhold ρ*∠ 0.239, so that if ρ < ρ* and A is a random m x n encoding matrix of independently chosen standardGaussians, where m = O(n), then with overwhelming probability overchoice of A , for all x ∈ R n , LP decoding corrects ⌊ ρ m⌋ arbitrary errors in the encoding Ax, while decoding can be made to fail if the error rate exceeds ρ*. Our boundresolves an open question of Candès, Rudelson, Tao, and Vershyin [3] and (oddly, but explicably) refutesempirical conclusions of Donoho [11] and Candès et al [3]. By scaling and rounding we can easilytransform these results to obtain polynomial-time decodable random linear codes with polynomial-sized alphabets tolerating any ρ < ρ* ∠ 0.239 fraction of arbitrary errors. In the context of privacy-preserving datamining our results say thatany privacy mechanism, interactive or non-interactive, providingreasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.

UAI Conference 2005 Conference Paper

On Privacy-Preserving Histograms

  • Shuchi Chawla 0001
  • Cynthia Dwork
  • Frank McSherry
  • Kunal Talwar

We advance the approach initiated by Chawla et al. for sanitizing (census) data so as to preserve the privacy of respondents while simultaneously extracting "useful" statistical information. First, we extend the scope of their techniques to a broad and rich class of distributions, specifically, mixtures of highdimensional balls, spheres, Gaussians, and other "nice" distributions. Second, we randomize the histogram constructions to preserve spatial characteristics of the data, allowing us to approximate various quantities of interest, e.g., cost of the minimum spanning tree on the data, in a privacy-preserving fashion.

STOC Conference 2004 Conference Paper

A decentralized algorithm for spectral analysis

  • David Kempe 0001
  • Frank McSherry

In many large network settings, such as computer networks, social networks, or hyperlinked text documents, much information can be obtained from the network's spectral properties. However, traditional centralized approaches for computing eigenvectors struggle with at least two obstacles: the data may be difficult to obtain (both due to technical reasons and because of privacy concerns), and the sheer size of the networks makes the computation expensive. A decentralized, distributed algorithm addresses both of these obstacles: it utilizes the computational power of all nodes in the network and their ability to communicate, thus speeding up the computation with the network size. And as each node knows its incident edges, the data collection problem is avoided as well.Our main result is a simple decentralized algorithm for computing the top k eigenvectors of a symmetric weighted adjacency matrix, and a proof that it converges essentially in O (τ MIX log 2 n ) rounds of communication and computation, where τ MIX is the mixing time of a random walk on the network. An additional contribution of our work is a decentralized way of actually detecting convergence, and diagnosing the current error. Our protocol scales well, in that the amount of computation performed at any node in any one round, and the sizes of messages sent, depend polynomially on k , but not on the (typically much larger) number n of nodes.

FOCS Conference 2004 Conference Paper

Spectral Analysis of Random Graphs with Skewed Degree Distributions

  • Anirban Dasgupta 0001
  • John E. Hopcroft
  • Frank McSherry

We extend spectral methods to random graphs with skewed degree distributions through a degree based normalization closely connected to the normalized Laplacian. The normalization is based on intuition drawn from perturbation theory of random matrices, and has the effect of boosting the expectation of the random adjacency matrix without increasing the variances of its entries, leading to better perturbation bounds. The primary implication of this result lies in the realm of spectral analysis of random graphs with skewed degree distributions, such as the ubiquitous "power law graphs". Mihail and Papadimitriou (2002) argued that for randomly generated graphs satisfying a power law degree distribution, spectral analysis of the adjacency matrix simply produces the neighborhoods of the high degree nodes as its eigenvectors, and thus miss any embedded structure. We present a generalization of their model, incorporating latent structure, and prove that after applying our transformation, spectral analysis succeeds in recovering the latent structure with high probability.

STOC Conference 2001 Conference Paper

Fast computation of low rank matrix

  • Dimitris Achlioptas
  • Frank McSherry

Given a matrix A it is often desirable to find an approximation to A that has low rank. We introduce a simple technique for accelerating the computation of such approximations when A has strong spectral structure, i.e., when the singular values of interest are significantly greater than those of a random matrix with size and entries similar to A . Our technique amounts to independently sampling and/or quantizing the entries of A , thus speeding up computation by reducing the number of non-zero entries and/or the length of their representation. Our analysis is based on observing that the acts of sampling and quantization can be viewed as adding a random matrix E to A , whose entries are independent random variables with zero-mean and bounded variance. Since, with high probability, E has very weak spectral structure, we can prove that the effect of sampling and quantization nearly vanishes when a low rank approximation to A+E is computed. In fact, the stronger the spectral structure of A , the more of its entries we can afford to discard and, ultimately, the faster we can discover that structure. We give bounds on the quality of our approximation both in the L2 and in the Frobenius norm.

NeurIPS Conference 2001 Conference Paper

Sampling Techniques for Kernel Methods

  • Dimitris Achlioptas
  • Frank Mcsherry
  • Bernhard Schölkopf

We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained ap- proximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a “randomized kernel” which behaves like

FOCS Conference 2001 Conference Paper

Spectral Partitioning of Random Graphs

  • Frank McSherry

Problems such as bisection, graph coloring, and clique are generally believed hard in the worst case. However, they can be solved if the input data is drawn randomly from a distribution over graphs containing acceptable solutions. In this paper we show that a simple spectral algorithm can solve all three problems above in the average case, as well as a more general problem of partitioning graphs based on edge density. In nearly all cases our approach meets or exceeds previous parameters, while introducing substantial generality. We apply spectral techniques, using foremost the observation that in all of these problems, the expected adjacency matrix is a low rank matrix wherein the structure of the solution is evident.

FOCS Conference 2001 Conference Paper

Web Search via Hub Synthesis

  • Dimitris Achlioptas
  • Amos Fiat
  • Anna R. Karlin
  • Frank McSherry

We present a model for web search that captures in a unified manner three critical components of the problem: how the link structure of the web is generated, how the content of a web document is generated, and how a human searcher generates a query. The key to this unification lies in capturing the correlations between these components in terms of proximity in a shared latent semantic space. Given such a combined model, the correct answer to a search query is well defined, and thus it becomes possible to evaluate web search algorithms rigorously. We present a new web search algorithm, based on spectral techniques, and prove that it is guaranteed to produce an approximately correct answer in our model. The algorithm assumes no knowledge of the model, and is well-defined regardless of the model's accuracy.