TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

Yi Hao; Ayush Jain 0001; Alon Orlitsky; Vaishakh Ravindrakumar

Back to ICML

ICML 2022

TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

Conference Paper Accepted Paper Artificial Intelligence · Machine Learning

Details

Abstract

Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an $\ell_1$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d$ polynomial, where $t\ge1$ and $d\ge0$. Letting $c_{t, d}$ denote the smallest such factor, clearly $c_{1, 0}=1$, and it can be shown that $c_{t, d}\ge 2$ for all other $t$ and $d$. Yet current computationally efficient algorithms show only $c_{t, 1}\le 2. 25$ and the bound rises quickly to $c_{t, d}\le 3$ for $d\ge 9$. We derive a near-linear-time and essentially sample-optimal estimator that establishes $c_{t, d}=2$ for all $(t, d)\ne(1, 0)$. Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.

TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

Abstract

Authors

Keywords

Context