Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

Ilias Diakonikolas; Daniel M. Kane

Back to FOCS

FOCS 2025

Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

Conference Paper Accepted Paper Algorithms and Complexity · Theoretical Computer Science

Details

Abstract

We study the general task of learning latent-variable models on ℝ d with k hidden parameters. A common technique to address this task algorithmically is (some version of) the method of moments. Unfortunately, moment-based approaches are often hampered by the fact that the moment tensors of super-constant degree cannot even be written down in polynomial time. Motivated by such learning applications, we develop a general efficient algorithm for implicit moment tensor computation. Roughly speaking, our algorithm computes in poly(d, k) time a succinct approximate description of tensors of the form ${M_m} = \sum\nolimits_{i = 1}^k {{w_i}} v_i^{ \otimes m}$, for w i ∈ ℝ + —even for m = ω(1)—assuming that there exists an unbiased estimator for M m with small variance that takes an appropriately nice form. Our framework broadly generalizes, both conceptually and technically, the work of [1] which developed an efficient algorithm for the specific moment tensors that arise in the task of clustering mixtures of spherical Gaussians. By leveraging our implicit moment estimation algorithm, we obtain the first poly(d, k)-time learning algorithms for the following classical latent-variable models—thereby resolving or making significant progress towards a number of important open problems in the literature. • Mixtures of Linear Regressions Given i. i. d. samples (x, y) with x ∼ N(0, I) and such that the joint distribution on (x, y) is an unknown k-mixture of linear regressions on ℝ d+1 corrupted with Gaussian noise, the goal is to learn the underlying distribution in total variation distance. We give a poly(d, k, 1/ϵ)-time algorithm for this task, where ϵ is the desired error. The previously best algorithm has super-polynomial complexity in k. • Mixtures of Spherical Gaussians Given i. i. d. samples from a k-mixture of identity covariance Gaussians on ℝ d, the goal is to learn the target mixture. For density estimation, we give a poly(d, k, 1/ ϵ)-time learning algorithm, where ϵ is the desired total variation error, under the condition that the means lie in a ball of radius $O(\sqrt {\log k} )$. Prior algorithms incur super-polynomial complexity in k. For parameter estimation, we give a poly(d, k, 1/ ϵ)-time algorithm where ϵ is the target accuracy, under the optimal mean separation of Ω(log 1/2 (k/ϵ)) and the condition that the largest distance is comparable to the smallest. Prior polynomial-time parameter estimation algorithms require separation Ω(log 1/2+c (k/ϵ)), for c > 0. • Positive Linear Combinations of Non-Linear Activations Given i. i. d. samples (x, y) with x ∼ N(0, I) and y = F(x), where F is a positive linear combination of k reasonable non-linear activations on ℝ d, the goal is to learn the target function in L 2 -norm. Our main result is a general algorithm for this task with complexity poly(d, k)g(ϵ), where ϵ is the desired error and the function g depends on the Hermite concentration of the target class of functions. Specifically, for positive linear combinations of ReLU activations, our algorithm has complexity poly(d, k)2 poly(1/ϵ). This is the first algorithm for this class that runs in poly(d, k) time for sub-constant values of ϵ = o k, d (1). Finally, for positive linear combinations of cosine activations with bounded frequency, our algorithm runs in poly(d, k, 1/ ϵ) time.

Authors

Keywords

Time-frequency analysis
Tensors
Parameter estimation
Linear regression
Estimation
Clustering algorithms
Total variance
Approximation algorithms
Polynomials
Complexity theory
Latent Variable Model
Moment Tensor
Parameter Estimates
Learning Algorithms
Efficient Algorithm
Density Estimation
Functional Class
Mixture Model
Estimation Algorithm
Generation Algorithm
Learning Applications
Largest Distance
Ball Of Radius
Optimal Separation
Problem In The Literature
Parameter Estimation Algorithm
Total Variation Distance
Succinct Description
Neural Network
High Probability
Hermite Polynomials
Second Moment
Probability 1
Computational Sequence
Isometry
Parameter Estimation Problem
Production Of Space
Computational Efficiency
Rejection Sampling
Results In Context
learning theory
latent variable models
mixture models
neural networks
efficient algorithms

Context

Venue: IEEE Symposium on Foundations of Computer Science
Archive span: 1975-2025
Indexed papers: 3809
Paper id: 375866144525197317