STOC Conference 2024 Conference Paper
Sparsifying Generalized Linear Models
- Arun Jambulapati
- James R. Lee
- Yang P. Liu
- Aaron Sidford
We consider the sparsification of sums F : ℝ n → ℝ + where F ( x ) = f 1 (⟨ a 1 , x ⟩) + ⋯ + f m (⟨ a m , x ⟩) for vectors a 1 ,…, a m ∈ ℝ n and functions f 1 ,…, f m : ℝ → ℝ + . We show that (1+ε)-approximate sparsifiers of F with support size n /ε 2 (log n /ε) O (1) exist whenever the functions f 1 ,…, f m are symmetric, monotone, and satisfy natural growth bounds. Additionally, we give efficient algorithms to compute such a sparsifier assuming each f i can be evaluated efficiently. Our results generalize the classical case of ℓ p sparsification, where f i ( z ) = | z | p , for p ∈ (0, 2], and give the first near-linear size sparsifiers in the well-studied setting of the Huber loss function and its generalizations, e.g., f i ( z ) = min{| z | p , | z | 2 } for 0 < p ≤ 2. Our sparsification algorithm can be applied to give near-optimal reductions for optimizing a variety of generalized linear models including ℓ p regression for p ∈ (1, 2] to high accuracy, via solving (log n ) O (1) sparse regression instances with m ≤ n (log n ) O (1) , plus runtime proportional to the number of nonzero entries in the vectors a 1 , …, a m .