Arrow Research search

Author name cluster

Graham Cormode

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

NeurIPS Conference 2025 Conference Paper

Sum Estimation under Personalized Local Differential Privacy

  • Dajun Sun
  • Wei Dong
  • Yuan Qiu
  • Ke Yi
  • Graham Cormode

People have diverse privacy requirements. This is best modeled using a personalized local differential privacy model where each user privatizes their data using a possibly different privacy parameter. While the model of personalized local differential privacy is a natural and important one, prior work has failed to give meaningful error bounds. In this paper, we study the foundational sum/mean estimation problem under this model. We present two novel protocols that achieve strong error guarantees. The first gives a guarantee based on the radius of the data, suiting inputs that are centered around zero. The second extends the guarantee to the diameter of the data, capturing the case when the points are situated arbitrarily. Experimental results on both synthetic and real data show that our protocols significantly outperform existing methods in terms of accuracy while providing a strong level of privacy.

ICML Conference 2023 Conference Paper

Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting

  • Jonathan Hehir
  • Daniel Ting
  • Graham Cormode

Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building block in data warehouses. Although many practical sketches for cardinality estimation exist, none provide privacy when merging. We propose the first practical cardinality sketches that are simultaneously mergeable, differentially private (DP), and have low empirical errors. These introduce a novel randomized algorithm for performing logical operations on noisy bits, a tight privacy analysis, and provably optimal estimation. Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data.

ICLR Conference 2022 Conference Paper

On the Importance of Difficulty Calibration in Membership Inference Attacks

  • Lauren Watson
  • Chuan Guo 0001
  • Graham Cormode
  • Alexandre Sablayrolles

The vulnerability of machine learning models to membership inference attacks has received much attention in recent years. However, existing attacks mostly remain impractical due to having high false positive rates, where non-member samples are often erroneously predicted as members. This type of error makes the predicted membership signal unreliable, especially since most samples are non-members in real world applications. In this work, we argue that membership inference attacks can benefit drastically from difficulty calibration, where an attack's predicted membership score is adjusted to the difficulty of correctly classifying the target sample. We show that difficulty calibration can significantly reduce the false positive rate of a variety of existing attacks without a loss in accuracy.

ICML Conference 2018 Conference Paper

Leveraging Well-Conditioned Bases: Streaming and Distributed Summaries in Minkowski p-Norms

  • Graham Cormode
  • Charlie Dickens
  • David P. Woodruff

Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm $\ell_2$. We study other $\ell_p$ norms, which are more robust for $p 2$. Unlike previous algorithms for such norms, we give algorithms that are (1) deterministic, (2) work simultaneously for every $p \geq 1$, including $p = \infty$, and (3) can be implemented in both distributed and streaming environments. We study $\ell_p$-regression, entrywise $\ell_p$-low rank approximation, and versions of approximate matrix multiplication.

SODA Conference 2016 Conference Paper

Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams

  • Rajesh Chitnis
  • Graham Cormode
  • Hossein Esfandiari
  • MohammadTaghi Hajiaghayi
  • Andrew McGregor 0001
  • Morteza Monemizadeh
  • Sofya Vorotnikova

In this paper we present a simple but powerful subgraph sampling primitive that is applicable in a variety of computational models including dynamic graph streams (where the input graph is defined by a sequence of edge/hyperedge insertions and deletions) and distributed systems such as MapReduce. In the case of dynamic graph streams, we use this primitive to prove the following results: Matching: Our main result for matchings is that there exists an Õ ( k 2 ) space algorithm that returns the edges of a maximum matching on the assumption the cardinality is at most k. The best previous algorithm used Õ ( kn ) space where n is the number of vertices in the graph and we prove our result is optimal up to logarithmic factors. Our algorithm has Õ (1) update time. We also show that there exists an Õ ( n 2 / α 3 ) space algorithm that returns an α -approximation for matchings of arbitrary size. In independent work, Assadi et al. (SODA 2016) proved this approximation algorithm is optimal and provided an alternative algorithm. We generalize our exact and approximate algorithms to weighted matching. For graphs with low arboricity such as planar graphs, the space required for constant approximation can be further reduced. While there has been a substantial amount of work on approximate matching in insert-only graph streams, these are the first nontrivial results in the dynamic setting. Vertex Cover and Hitting Set: There exists an Õ ( k d ) space algorithm that solves the minimum hitting set problem where d is the cardinality of the input sets and k is an upper bound on the size of the minimum hitting set. We prove this is optimal up to logarithmic factors. Our algorithm has Õ (1) update time. The case d = 2 corresponds to minimum vertex cover. Finally, we consider a larger family of parameterized problems (including b -matching, disjoint paths, vertex coloring among others) for which our subgraph sampling primitive yields fast, small-space dynamic graph stream algorithms. We then show lower bounds for natural problems outside this family.

ICML Conference 2015 Conference Paper

Correlation Clustering in Data Streams

  • Kook Jin Ahn
  • Graham Cormode
  • Sudipto Guha
  • Andrew McGregor 0001
  • Anthony Wirth

In this paper, we address the problem of \emphcorrelation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on n nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in different clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, O(n⋅\textpolylog n)-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the “quality” of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. However the standard LP and SDP formulations are not obviously solvable in O(n⋅\textpolylog n)-space. Our work presents space-efficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. Note that the improved space and running-time bounds achieved from streaming algorithms are also useful for offline settings such as MapReduce models.

SODA Conference 2015 Conference Paper

Parameterized Streaming: Maximal Matching and Vertex Cover

  • Rajesh Chitnis
  • Graham Cormode
  • MohammadTaghi Hajiaghayi
  • Morteza Monemizadeh

As graphs continue to grow in size, we seek ways to effectively process such data at scale. The model of streaming graph processing, in which a compact summary is maintained as each edge insertion/deletion is observed, is an attractive one. However, few results are known for optimization problems over such dynamic graph streams. In this paper, we introduce a new approach to handling graph streams, by instead seeking solutions for the parameterized versions of these problems. Here, we are given a parameter k and the objective is to decide whether there is a solution bounded by k. By combining kernelization techniques with randomized sketch structures, we obtain the first streaming algorithms for the parameterized versions of Maximal Matching and Vertex Cover. We consider various models for a graph stream on n nodes: the insertion-only model where the edges can only be added, and the dynamic model where edges can be both inserted and deleted. More formally, we show the following results: • In the insertion only model, there is a one-pass deterministic algorithm for the parameterized Vertex Cover problem which computes a sketch using Õ ( k 2 ) space 1 such that at each timestamp in time Õ (2 k ) it can either extract a solution of size at most k for the current instance, or report that no such solution exists. We also show a tight lower bound of Ω( k 2 ) for the space complexity of any (randomized) streaming algorithms for the parameterized Vertex Cover, even in the insertion-only model. • In the dynamic model, and under the promise that at each timestamp there is a maximal matching of size at most k, there is a one-pass Õ ( k 2 )-space (sketch-based) dynamic algorithm that maintains a maximal matching with worst-case update time Õ ( k 2 ). This algorithm partially solves Open Problem 64 from [1]. An application of this dynamic matching algorithm is a one-pass Õ ( k 2 )-space streaming algorithm for the parameterized Vertex Cover problem that in time Õ (2 k ) extracts a solution for the final instance with probability 1 – δ/ n o (1), where δ < 1. To the best of our knowledge, this is the first graph streaming algorithm that combines linear sketching with sequential operations that depend on the graph at the current time. • In the dynamic model without any promise, there is a one-pass randomized algorithm for the parameterized Vertex Cover problem which computes a sketch using Õ ( nk ) space such that in time Õ ( nk + 2 k ) it can either extract a solution of size at most k for the final instance, or report that no such solution exists.

SODA Conference 2014 Conference Paper

Annotations for Sparse Data Streams

  • Amit Chakrabarti
  • Graham Cormode
  • Navin Goyal
  • Justin Thaler

Motivated by the surging popularity of commercial cloud computing services, a number of recent works have studied annotated data streams and variants thereof. In this setting, a computationally weak verifier (cloud user), lacking the resources to store and manipulate his massive input locally, accesses a powerful but untrusted prover (cloud service). The verifier must work within the restrictive data streaming paradigm. The prover, who can annotate the data stream as it is read, must not just supply the final answer but also convince the verifier of its correctness. Ideally, both the amount of annotation from the prover and the space used by the verifier should be sublinear in the relevant input size parameters. A rich theory of such algorithms—which we call schemes —has started to emerge. Prior work has shown how to leverage the prover's power to efficiently solve problems that have no non-trivial standard data stream algorithms. However, even though optimal schemes are now known for several basic problems, such optimality holds only for streams whose length is commensurate with the size of the data universe. In contrast, many real-world data sets are relatively sparse, including graphs that contain only o ( n 2 ) edges, and IP traffic streams that contain much fewer than the total number of possible IP addresses, 2 128 in IPv6. Here we design the first annotation schemes that allow both the annotation and the space usage to be sublinear in the total number of stream updates rather than the size of the data universe. We solve significant problems, including variations of INDEX, SET-DISJOINTNESS, and FREQUENCY-MOMENTS, plus several natural problems on graphs. On the other hand, we give a new lower bound that, for the first time, rules out smooth tradeoffs between annotation and space usage for a specific problem. Our technique brings out new nuances in Merlin-Arthur communication complexity models, and provides a separation between online versions of the MA and AMA models.

FOCS Conference 2010 Conference Paper

Information Cost Tradeoffs for Augmented Index and Streaming Language Recognition

  • Amit Chakrabarti
  • Graham Cormode
  • Ranganath Kondapally
  • Andrew McGregor 0001

This paper makes three main contributions to the theory of communication complexity and stream computation. First, we present new bounds on the information complexity of AUGMENTED-INDEX. In contrast to analogous results for INDEX by Jain, Radhakrishnan and Sen [J. ACM, 2009], we have to overcome the significant technical challenge that protocols for AUGMENTED-INDEX may violate the "rectangle property" due to the inherent input sharing. Second, we use these bounds to resolve an open problem of Magniez, Mathieu and Nayak [STOC, 2010] on the multi-pass complexity of recognizing Dyck languages. This results in a natural separation between the standard multi-pass model and the multi-pass model that permits reverse passes. Third, we present the first passive memory checkers that verify the interaction transcripts of priority queues, stacks, and double-ended queues. We obtain tight upper and lower bounds for these problems, thereby addressing an important sub-class of the memory checking framework of Blum et al. [Algorithmica, 1994].