An improved approximation algorithm for the column subset selection problem

Christos Boutsidis; Michael W. Mahoney; Petros Drineas

Back to SODA

SODA 2009

An improved approximation algorithm for the column subset selection problem

Conference Paper Accepted Paper Algorithms and Complexity · Theoretical Computer Science

Details

Abstract

We consider the problem of selecting the “best” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O (min{ mn 2, m 2 n }) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O ( k log k ) columns according to a judiciously-chosen probability distribution that depends on information in the top- k right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m × k matrix containing those k columns, let P C denote the projection matrix onto the span of those columns, and let A k denote the “best” rank- k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that with probability at least 0. 7. This spectral norm bound improves upon the best previously-existing result (of Gu and Eisenstat [21]) for the spectral norm version of this Column ubset election Problem. We also prove that with the same probability. This Frobenius norm bound is only a factor of worse than the best previously existing existential result and is roughly better than the best previous algorithmic result (both of Deshpande et al. [11]) for the Frobenius norm version of this Column ubset election Problem.

An improved approximation algorithm for the column subset selection problem

Abstract

Authors

Keywords

Context