Arrow Research search
Back to FOCS

FOCS 2000

Clustering Data Streams

Conference Paper Session 8 Algorithms and Complexity ยท Theoretical Computer Science

Abstract

We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time. The data stream model is relevant to new classes of applications involving massive data sets, such as Web click stream analysis and multimedia data analysis. We give constant-factor approximation algorithms for the k-median problem in the data stream model of computation in a single pass. We also show negative results implying that our algorithms cannot be improved in a certain sense.

Authors

Keywords

  • Clustering algorithms
  • Streaming media
  • Approximation algorithms
  • Computer science
  • Computational modeling
  • Data analysis
  • Application software
  • Telephony
  • Web pages
  • Laboratories
  • Data Stream Clustering
  • Time And Space
  • Running Time
  • Constant Factor
  • Primal-dual Algorithm
  • Data Streams
  • Estimation Algorithm
  • Sequence Of Points
  • Single Pass
  • Clickstream
  • Deterministic
  • Lower Bound
  • Performance Of Algorithm
  • Clustering Algorithm
  • Points In Space
  • Small Space
  • Optimum Solution
  • Pair Of Points
  • Random Access
  • Linear Space
  • Space Requirements
  • Input Point
  • Cost Of Solution
  • Local Search Algorithm
  • Divide-and-conquer Approach
  • Facility Location Problem
  • Linear Scan
  • Constant Probability
  • Arbitrary Point

Context

Venue
IEEE Symposium on Foundations of Computer Science
Archive span
1975-2025
Indexed papers
3809
Paper id
631618087649962391