Compressed K-Means for Large-Scale Clustering

Xiaobo Shen; Weiwei Liu; Ivor Tsang; Fumin Shen; Quan-Sen Sun

Back to AAAI

AAAI 2017

Compressed K-Means for Large-Scale Clustering

Conference Paper Machine Learning Methods Artificial Intelligence

PDF Details

Abstract

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Speciﬁcally, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key beneﬁts: 1) storage can be signiﬁcantly reduced by representing data points as binary codes; 2) distance computation is very efﬁcient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-theart large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

Compressed K-Means for Large-Scale Clustering

Abstract

Authors

Keywords

Context