Arrow Research search

Author name cluster

David Durfee

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

NeurIPS Conference 2024 Conference Paper

Instance-Specific Asymmetric Sensitivity in Differential Privacy

  • David Durfee

We provide a new algorithmic framework for differentially private estimation of general functions that adapts to the hardness of the underlying dataset. We build upon previous work that gives a paradigm for selecting an output through the exponential mechanism based upon closeness of the inverse to the underlying dataset, termed the inverse sensitivity mechanism. Our framework will slightly modify the closeness metric and instead give a simple and efficient application of the sparse vector technique. While the inverse sensitivity mechanism was shown to be instance optimal, it was only with respect to a class of unbiased mechanisms such that the most likely outcome matches the underlying data. We break this assumption in order to more naturally navigate the bias-variance tradeoff, which will also critically allow for extending our method to unbounded data. In consideration of this tradeoff, we provide theoretical guarantees and empirical validation that our technique will be particularly effective when the distances to the underlying dataset are asymmetric. This asymmetry is inherent to a range of important problems including fundamental statistics such as variance, as well as commonly used machine learning performance metrics for both classification and regression tasks. We efficiently instantiate our method in $O(n)$ time for these problems and empirically show that our techniques will give substantially improved differentially private estimations.

NeurIPS Conference 2023 Conference Paper

Unbounded Differentially Private Quantile and Maximum Estimation

  • David Durfee

In this work we consider the problem of differentially private computation ofquantiles for the data, especially the highest quantiles such as maximum, butwith an unbounded range for the dataset. We show that this can be doneefficiently through a simple invocation of $\texttt{AboveThreshold}$, asubroutine that is iteratively called in the fundamental Sparse VectorTechnique, even when there is no upper bound on the data. In particular, weshow that this procedure can give more accurate and robust estimates on thehighest quantiles with applications towards clipping that is essential fordifferentially private sum and mean estimation. In addition, we show how twoinvocations can handle the fully unbounded data setting. Within our study, weshow that an improved analysis of $\texttt{AboveThreshold}$ can improve theprivacy guarantees for the widely used Sparse Vector Technique that is ofindependent interest. We give a more general characterization of privacy lossfor $\texttt{AboveThreshold}$ which we immediately apply to our method forimproved privacy guarantees. Our algorithm only requires one $O(n)$ passthrough the data, which can be unsorted, and each subsequent query takes $O(1)$time. We empirically compare our unbounded algorithm with the state-of-the-artalgorithms in the bounded setting. For inner quantiles, we find that our methodoften performs better on non-synthetic datasets. For the maximal quantiles, which we apply to differentially private sum computation, we find that ourmethod performs significantly better.

ICML Conference 2020 Conference Paper

Optimal Differential Privacy Composition for Exponential Mechanisms

  • Jinshuo Dong
  • David Durfee
  • Ryan M. Rogers

Composition is one of the most important properties of differential privacy (DP), as it allows algorithm designers to build complex private algorithms from DP primitives. We consider precise composition bounds of the overall privacy loss for exponential mechanisms, one of the fundamental classes of mechanisms in DP. Exponential mechanism has also become a fundamental building block in private machine learning, e. g. private PCA and hyper-parameter selection. We give explicit formulations of the optimal privacy loss for both the adaptive and non-adaptive composition of exponential mechanism. For the non-adaptive setting in which each mechanism has the same privacy parameter, we give an efficiently computable formulation of the optimal privacy loss. In the adaptive case, we derive a recursive formula and an efficiently computable upper bound. These precise understandings about the problem lead to a 40% saving of the privacy budget in a practical application. Furthermore, the algorithm-specific analysis shows a difference in privacy parameters of adaptive and non-adaptive composition, which was widely believed to not exist based on the evidence from general analysis.

SODA Conference 2020 Conference Paper

Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds

  • Laxman Dhulipala
  • David Durfee
  • Janardhan Kulkarni
  • Richard Peng
  • Saurabh Sawlani
  • Xiaorui Sun

In this paper we study the problem of dynamically maintaining graph properties under batches of edge insertions and deletions in the massively parallel model of computation. In this setting, the graph is stored on a number of machines, each having space strongly sublinear with respect to the number of vertices, that is, n ϵ for some constant 0 < ϵ < 1. Our goal is to handle batches of updates and queries where the data for each batch fits onto one machine in constant rounds of parallel computation, as well as to reduce the total communication between the machines. This objective corresponds to the gradual buildup of databases over time, while the goal of obtaining constant rounds of communication for problems in the static setting has been elusive for problems as simple as undirected graph connectivity. We give an algorithm for dynamic graph connectivity in this setting with constant communication rounds and communication cost almost linear in terms of the batch size. Our techniques combine a new graph contraction technique, an independent random sample extractor from correlated samples, as well as distributed data structures supporting parallel updates and queries in batches. We also illustrate the power of dynamic algorithms in the MPC model by showing that the batched version of the adaptive connectivity problem is P-complete in the centralized setting, but sub-linear sized batches can be handled in a constant number of rounds. Due to the wide applicability of our approaches, we believe it represents a practically-motivated workaround to the current difficulties in designing more efficient massively parallel static graph algorithms.

NeurIPS Conference 2019 Conference Paper

Practical Differentially Private Top-k Selection with Pay-what-you-get Composition

  • David Durfee
  • Ryan Rogers

We study the problem of top-k selection over a large domain universe subject to user-level differential privacy. Typically, the exponential mechanism or report noisy max are the algorithms used to solve this problem. However, these algorithms require querying the database for the count of each domain element. We focus on the setting where the data domain is unknown, which is different than the setting of frequent itemsets where an apriori type algorithm can help prune the space of domain elements to query. We design algorithms that ensures (approximate) differential privacy and only needs access to the true top-k' elements from the data for any chosen k' ≥ k. This is a highly desirable feature for making differential privacy practical, since the algorithms require no knowledge of the domain. We consider both the setting where a user's data can modify an arbitrary number of counts by at most 1, i. e. unrestricted sensitivity, and the setting where a user's data can modify at most some small, fixed number of counts by at most 1, i. e. restricted sensitivity. Additionally, we provide a pay-what-you-get privacy composition bound for our algorithms. That is, our algorithms might return fewer than k elements when the top-k elements are queried, but the overall privacy budget only decreases by the size of the outcome set.

FOCS Conference 2017 Conference Paper

Determinant-Preserving Sparsification of SDDM Matrices with Applications to Counting and Sampling Spanning Trees

  • David Durfee
  • John Peebles
  • Richard Peng
  • Anup B. Rao

We show variants of spectral sparsification routines can preserve the total spanning tree counts of graphs, which by Kirchhoff's matrix-tree theorem, is equivalent to determinant of a graph Laplacian minor, or equivalently, of any SDDM matrix. Our analyses utilizes this combinatorial connection to bridge between statistical leverage scores/effective resistances and the analysis of random graphs by [Janson, Combinatorics, Probability and Computing `94]. This leads to a routine that in quadratic time, sparsifies a graph down to about n1. 5 edges in ways that preserve both the determinant and the distribution of spanning trees (provided the sparsified graph is viewed as a random object). Extending this algorithm to work with Schur complements and approximate Cholesky factorizations leads to algorithms for counting and sampling spanning trees which are nearly optimal for dense graphs. We give an algorithm that computes a (1±δ) approximation to the determinant of any SDDM matrix with constant probability in about n 2 δ -2 time. This is the first routine for graphs that outperforms general-purpose routines for computing determinants of arbitrary matrices. We also give an algorithm that generates in about n 2 δ -2 time a spanning tree of a weighted undirected graph from a distribution with total variation distance of δ from the w-uniform distribution.

STOC Conference 2017 Conference Paper

Sampling random spanning trees faster than matrix multiplication

  • David Durfee
  • Rasmus Kyng
  • John Peebles
  • Anup B. Rao
  • Sushant Sachdeva

We present an algorithm that, with high probability, generates a random spanning tree from an edge-weighted undirected graph in ( n 5/3 m 1/3 ) time. The tree is sampled from a distribution where the probability of each tree is proportional to the product of its edge weights. This improves upon the previous best algorithm due to Colbourn et al. that runs in matrix multiplication time, O ( n ω ). For the special case of unweighted graphs, this improves upon the best previously known running time of Õ(min{ n ω , m √ n , m 4/3 }) for m ⪢ n 7/4 (Colbourn et al. '96, Kelner-Madry '09, Madry et al. '15). The effective resistance metric is essential to our algorithm, as in the work of Madry et al., but we eschew determinant-based and random walk-based techniques used by previous algorithms. Instead, our algorithm is based on Gaussian elimination, and the fact that effective resistance is preserved in the graph resulting from eliminating a subset of vertices (called a Schur complement). As part of our algorithm, we show how to compute -approximate effective resistances for a set S of vertex pairs via approximate Schur complements in Õ( m +( n + | S |)ε -2 ) time, without using the Johnson-Lindenstrauss lemma which requires Õ( min{( m + | S |) ε2 , m + n ε -4 +| S |ε 2 }) time. We combine this approximation procedure with an error correction procedure for handling edges where our estimate isn't sufficiently accurate.

FOCS Conference 2016 Conference Paper

On Fully Dynamic Graph Sparsifiers

  • Ittai Abraham
  • David Durfee
  • Ioannis Koutis
  • Sebastian Forster
  • Richard Peng

We initiate the study of fast dynamic algorithms for graph sparsification problems and obtain fully dynamic algorithms, allowing both edge insertions and edge deletions, that take polylogarithmic time after each update in the graph. Our three main results are as follows. First, we give a fully dynamic algorithm for maintaining a (1 ± ϵ)-spectral sparsifier with amortized update time poly(log n, ϵ -1 ). Second, we give a fully dynamic algorithm for maintaining a (1 ± ϵ)-cut sparsifier with worst-case update time poly(log n, ϵ -1 ). Both sparsifiers have size n · poly(log n, ϵ -1 ). Third, we apply our dynamic sparsifier algorithm to obtain a fully dynamic algorithm for maintaining a (1 - ϵ)-approximation to the value of the maximum flow in an unweighted, undirected, bipartite graph with amortized update time poly(log n, ϵ -1 ).