Arrow Research search
Back to ICML

ICML 2015

Binary Embedding: Fundamental Limits and Fast Algorithm

Conference Paper Accepted Paper Artificial Intelligence · Machine Learning

Abstract

Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in \mathbbS^p-1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δuniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m =Ω(\frac1δ^2\logN) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) we propose a novel fast binary embedding algorithm with provably optimal bit complexity m = O(\frac1 δ^2\logN) and near linear running time O(p \log p) whenever \log N ≪δ\sqrtp, with a slightly worse running time for larger \log N; (3) we also provide an analytic result about embedding a general set of points K ⊆\mathbbS^p-1 with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
International Conference on Machine Learning
Archive span
1993-2025
Indexed papers
16471
Paper id
379053746241183732