Arrow Research search
Back to SODA

SODA 2009

Finding duplicates in a data stream

Conference Paper Accepted Paper Algorithms and Complexity ยท Theoretical Computer Science

Abstract

Given a data stream of length n over an alphabet [ m ] where n > m, we consider the problem of finding a duplicate in a single pass. We give a randomized algorithm for this problem that uses O ((log m ) 3 ) space. This answers a question of Muthukrishnan [Mut05] and Tarui [Tar07], who asked if this problem could be solved using sub-linear space and one pass over the input. Our algorithm solves the more general problem of finding a positive frequency element in a stream given by frequency updates where the sum of all frequencies is positive. Our main tool is an Isolation Lemma that reduces this problem to the task of detecting and identifying a Dictatorial variable in a Boolean halfspace. We present various relaxations of the condition n > m, under which one can find duplicates efficiently.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
ACM-SIAM Symposium on Discrete Algorithms
Archive span
1990-2025
Indexed papers
4674
Paper id
70286223537745388