Arrow Research search
Back to FOCS

FOCS 2024

Replicability in High Dimensional Statistics

Conference Paper Accepted Paper Algorithms and Complexity ยท Theoretical Computer Science

Abstract

The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced the notion of replicable learning algorithms, and gave basic procedures for 1-dimensional tasks including statistical queries. In this work, we study the computational and statistical cost of replicability for several fundamental high dimensional statistical tasks, including multi-hypothesis testing and mean estimation. Our main contribution establishes a computational and statistical equivalence between optimal replicable algorithms and high dimensional isoperimetric tilings. As a consequence, we obtain matching sample complexity upper and lower bounds for replicable mean estimation of distributions with bounded covariance, resolving an open problem of [Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sivakumar, and Sorrell, STOC 2023] and for the $N$ -Coin Problem, resolving a problem of [Karbasi, Velegkas, Yang, and Zhou, NeurIPS 2023] up to log factors. While our equivalence is computational, allowing us to shave $\log$ factors in sample complexity from the best known efficient algorithms, efficient isoperimetric tilings are not known. To circumvent this, we introduce several relaxed paradigms that do allow for sample and computationally efficient algorithms, including allowing pre-processing, adaptivity, and approximate replicability. In these cases we give efficient algorithms matching or beating the best known sample complexity for mean estimation and the coin problem, including a generic procedure that reduces the standard quadratic overhead of replicability to linear in expectation.

Authors

Keywords

  • Computer science
  • Costs
  • Estimation
  • Approximation algorithms
  • Complexity theory
  • Computational efficiency
  • Replicability
  • Standards
  • Testing
  • High-dimensional
  • High-dimensional Statistics
  • Lower Bound
  • Efficient Algorithm
  • Areas Of Science
  • Replication Crisis
  • Statistical Equivalence
  • High Probability
  • Hypothesis Testing
  • Adaptive Algorithm
  • Low Surface Area
  • Independent Estimates
  • Polynomial-time Algorithm
  • Exponential Time
  • Random String
  • Fraction Of Points
  • Equivalence Theorem
  • high dimensional statistics
  • foams
  • isoperimetry
  • learning theory

Context

Venue
IEEE Symposium on Foundations of Computer Science
Archive span
1975-2025
Indexed papers
3809
Paper id
407044717261544973