Arrow Research search
Back to ICML

ICML 2025

Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model

Conference Paper Accept (poster) Artificial Intelligence ยท Machine Learning

Abstract

The turnstile continual release model of differential privacy captures scenarios where a privacy-preserving real-time analysis is sought for a dataset evolving through additions and deletions. In typical applications of real-time data analysis, both the length of the stream $T$ and the size of the universe $|\mathcal{U}|$ from which data come can be extremely large. This motivates the study of private algorithms in the turnstile setting using space sublinear in both $T$ and $|\mathcal{U}|$. In this paper, we give the first sublinear space differentially private algorithms for the fundamental problems of counting distinct elements in the turnstile streaming model. Our algorithm achieves, on arbitrary streams, $O_{\eta}(T^{1/3})$ space and additive error, and a $(1+\eta)$-relative approximation for all $\eta \in (0, 1)$. Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear, approaching the known $\Omega(T^{1/4})$ additive error lower bound for arbitrary streams. Moreover, when a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $O_{\eta}(\sqrt{W})$ additive error, using $O_{\eta}(\sqrt{W})$ space. This additive error asymptotically matches that of prior work which required instead linear space. Our results address an open question posed by Jain et al. about designing low-memory mechanisms for this problem. We complement this results with a space lower bound for this problem, which shows that any algorithm that uses similar techniques must use space $\Omega(T^{1/3})$.

Authors

Keywords

  • differential privacy
  • counting distinct elements
  • streaming data
  • turnstile
  • continual observation
  • continual release
  • sublinear space

Context

Venue
International Conference on Machine Learning
Archive span
1993-2025
Indexed papers
16471
Paper id
975663882324640501