Author name cluster

Allen Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

STOC Conference 2025 Conference Paper

Learning the Closest Product State

Ainesh Bakshi
John Bostanci
William Kretschmer
Zeph Landau
Jerry Li 0001
Allen Liu
Ryan O'Donnell
Ewin Tang

Details

STOC Conference 2025 Conference Paper

Model Stealing for Any Low-Rank Language Model

Allen Liu
Ankur Moitra

Details

STOC Conference 2024 Conference Paper

An Optimal Tradeoff between Entanglement and Copy Complexity for State Tomography

Sitan Chen
Jerry Li 0001
Allen Liu

Details

FOCS Conference 2024 Conference Paper

High-Temperature Gibbs States are Unentangled and Efficiently Preparable

Ainesh Bakshi
Allen Liu
Ankur Moitra
Ewin Tang

We show that thermal states of local Hamiltonians are separable above a constant temperature. Specifically, for a local Hamiltonian $H$ on a graph with degree $\mathfrak{g}$, its Gibbs state at inverse temperature $\beta$, denoted by $\rho=e^{-\beta H}/\text{tr}(e^{-\beta H})$, is a classical distribution over product states for all $\beta < 1/ (c \mathfrak{{g}})$, where $c$ is a constant. This sudden death of thermal entanglement upends conventional wisdom about the presence of short-range quantum correlations in Gibbs states. Moreover, we show that we can efficiently sample from the distribution over product states. In particular, for any $\beta < 1/(c\mathfrak{g}^{3})$, we can prepare a state $\varepsilon$ -close to $\rho$ in trace distance with a depth-one quantum circuit and $\text{poly}(n)\log(1/\varepsilon)$ classical overhead. 1 1 In independent and concurrent work, Rouzé, França, and Alhambra [37] obtain an efficient quantum algorithm for preparing high-temperature Gibbs states via a dissipative evolution.

Details

STOC Conference 2024 Conference Paper

Learning Quantum Hamiltonians at Any Temperature in Polynomial Time

Ainesh Bakshi
Allen Liu
Ankur Moitra
Ewin Tang

Details

NeurIPS Conference 2024 Conference Paper

Semi-Random Matrix Completion via Flow-Based Adaptive Reweighting

Jonathan A. Kelner
Jerry Li
Allen Liu
Aaron Sidford
Kevin Tian

We consider the well-studied problem of completing a rank-$r$, $\mu$-incoherent matrix $\mathbf{M} \in \mathbb{R}^{d \times d}$ from incomplete observations. We focus on this problem in the semi-random setting where each entry is independently revealed with probability at least $p = \frac{\textup{poly}(r, \mu, \log d)}{d}$. Whereas multiple nearly-linear time algorithms have been established in the more specialized fully-random setting where each entry is revealed with probablity exactly $p$, the only known nearly-linear time algorithm in the semi-random setting is due to [CG18], whose sample complexity has a polynomial dependence on the inverse accuracy and condition number and thus cannot achieve high-accuracy recovery. Our main result is the first high-accuracy nearly-linear time algorithm for solving semi-random matrix completion, and an extension to the noisy observation setting. Our result builds upon the recent short-flat decomposition framework of [KLLST23a, KLLST23b] and leverages fast algorithms for flow problems on graphs to solve adaptive reweighting subproblems efficiently.

PDF Details DOI

FOCS Conference 2024 Conference Paper

Structure Learning of Hamiltonians from Real-Time Evolution

Ainesh Bakshi
Allen Liu
Ankur Moitra
Ewin Tang

We study the problem of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i}Ht}$ for an unknown local Hamiltonian $H=\Sigma_{a=1}^{m}\lambda_{a}E_{a}$ on $n$ qubits, the goal is to recover $H$. This problem is already well-understood under the assumption that the interaction terms, $E_{a}$, are given, and only the interaction strengths, $\lambda_{a}$, are unknown. But how efficiently can we learn a local Hamiltonian without prior knowledge of its interaction structure? We present a new, general approach to Hamiltonian learning that not only solves the challenging structure learning variant, but also resolves other open questions in the area, all while achieving the gold standard of Heisenberg-limited scaling. In particular, our algorithm recovers the Hamiltonian to $\varepsilon$ error with total evolution time $\mathcal{O}(\log(n)/\varepsilon)$, and has the following appealing properties: 1)It does not need to know the Hamiltonian terms; 2)It works beyond the short-range setting, extending to any Hamiltonian $H$ where the sum of terms interacting with a qubit has bounded norm; 3)It evolves according to $H$ in constant time $t$ increments, thus achieving constant time resolution. As an application, we can also learn Hamiltonians exhibiting power-law decay up to accuracy $\varepsilon$ with total evolution time beating the standard limit of $1/\varepsilon^{2}$.

Details

STOC Conference 2023 Conference Paper

A New Approach to Learning Linear Dynamical Systems

Ainesh Bakshi
Allen Liu
Ankur Moitra
Morris Yau

Details

NeurIPS Conference 2023 Conference Paper

Constant Approximation for Individual Preference Stable Clustering

Anders Aamand
Justin Chen
Allen Liu
Sandeep Silwal
Pattara Sukprasert
Ali Vakilian
Fred Zhang

Individual preference (IP) stability, introduced by Ahmadi et al. (ICML 2022), is a natural clustering objective inspired by stability and fairness constraints. A clustering is $\alpha$-IP stable if the average distance of every data point to its own cluster is at most $\alpha$ times the average distance to any other cluster. Unfortunately, determining if a dataset admits a $1$-IP stable clustering is NP-Hard. Moreover, before this work, it was unknown if an $o(n)$-IP stable clustering always exists, as the prior state of the art only guaranteed an $O(n)$-IP stable clustering. We close this gap in understanding and show that an $O(1)$-IP stable clustering always exists for general metrics, and we give an efficient algorithm which outputs such a clustering. We also introduce generalizations of IP stability beyond average distance and give efficient near optimal algorithms in the cases where we consider the maximum and minimum distances within and between clusters.

PDF Details

FOCS Conference 2023 Conference Paper

Matrix Completion in Almost-Verification Time

Jonathan A. Kelner
Jerry Li 0001
Allen Liu
Aaron Sidford
Kevin Tian

We give a new framework for solving the fundamental problem of low-rank matrix completion, i. e. , approximating a rank- r matrix $\mathbf{M} \in \mathbb{R}^{m \times n}$ (where $m \geq n$) from random observations. First, we provide an algorithm which completes M on $99 \%$ of rows and columns under no further assumptions on M from $\approx m r$ samples and using $\approx m r^{2}$ time. Then, assuming the row and column spans of M satisfy additional regularity properties, we show how to boost this partial completion guarantee to a full matrix completion algorithm by aggregating solutions to regression problems involving the observations. In the well-studied setting where M has incoherent row and column spans, our algorithms complete M to high precision from $m r^{2+o(1)}$ observations in $m r^{3+o(1)}$ time (omitting logarithmic factors in problem parameters), improving upon the prior state-of-the-art [JN15] which used $\approx m r^{5}$ samples and $\approx m r^{7}$ time. Under an assumption on the row and column spans of M we introduce (which is satisfied by random subspaces with high probability), our sample complexity improves to an almost information-theoretically optimal $m r^{1+o(1)}$, and our runtime improves to $m r^{2+o(1)}$. Our runtimes have the appealing property of matching the best known runtime to verify that a rankr decomposition $\mathrm{UV}^{\top}$ agrees with the sampled observations. We also provide robust variants of our algorithms that, given random observations from $\mathrm{M}+\mathrm{N}$ with $\|\mathrm{N}\|_{\mathrm{F}} \leq \Delta$, complete M to Frobenius norm distance $\approx r^{1. 5} \Delta$ in the same runtimes as the noiseless setting. Prior noisy matrix completion algorithms [CP10] only guaranteed a distance of $\approx \sqrt{n} \Delta$.

Details

SODA Conference 2023 Conference Paper

Robust Voting Rules from Algorithmic Robust Statistics

Allen Liu
Ankur Moitra

Details

ICML Conference 2023 Conference Paper

Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems

Ainesh Bakshi
Allen Liu
Ankur Moitra
Morris Yau

Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.

Details

FOCS Conference 2023 Conference Paper

The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination

Clément L. Canonne
Samuel B. Hopkins 0001
Jerry Li 0001
Allen Liu
Shyam Narayanan

We consider the question of Gaussian mean testing, a fundamental task in high-dimensional distribution testing and signal processing, subject to adversarial corruptions of the samples. We focus on the relative power of different adversaries, and show that, in contrast to the common wisdom in robust statistics, there exists a strict separation between adaptive adversaries (strong contamination) and oblivious ones (weak contamination) for this task. Specifically, we resolve both the information-theoretic and computational landscapes for robust mean testing. In the exponential-time setting, we establish the tight sample complexity of testing $\mathcal{N}(0, I)$ against $\mathcal{N}(\alpha v, I)$, where $\|v\|_{2}=1$, with an $\varepsilon$-fraction of adversarial corruptions, to be $\tilde{\Theta}\left(\max \left(\frac{\sqrt{d}}{\alpha^{2}}, \frac{d \varepsilon^{3}}{\alpha^{4}}, \min \left(\frac{d^{2 / 3} \varepsilon^{2 / 3}}{\alpha^{8 / 3}}, \frac{d \varepsilon}{\alpha^{2}}\right)\right)\right)$ while the complexity against adaptive adversaries is $\tilde{\Theta}\left(\max \left(\frac{\sqrt{d}}{\alpha^{2}}, \frac{d \varepsilon^{2}}{\alpha^{4}}\right)\right)$ which is strictly worse for a large range of vanishing $\varepsilon, \alpha$. To the best of our knowledge, ours is the first separation in sample complexity between the strong and weak contamination models. In the polynomial-time setting, we close a gap in the literature by providing a polynomial-time algorithm against adaptive adversaries achieving the above sample complexity $\tilde{\Theta}\left(\max \left(\sqrt{d} / \alpha^{2}, d \varepsilon^{2} / \alpha^{4}\right)\right)$, and a low-degree lower bound (which complements an existing reduction from planted clique) suggesting that all efficient algorithms require this many samples, even in the oblivious-adversary setting.

Details

FOCS Conference 2023 Conference Paper

When Does Adaptivity Help for Quantum State Learning?

Sitan Chen
Brice Huang
Jerry Li 0001
Allen Liu
Mark Sellke

We consider the classic question of state tomography: given copies of an unknown quantum state $\rho \in \mathbb{C}^{d \times d}$, output $\widehat{\rho}$ which is close to $\rho$ in some sense, e. g. trace distance or fidelity. When one is allowed to make coherent measurements entangled across all copies, $\Theta\left(d^{2} / \varepsilon^{2}\right)$ copies are necessary and sufficient to get trace distance $\varepsilon$ [18], [29]. Unfortunately, the protocols achieving this rate incur large quantum memory overheads that preclude implementation on near-term devices. On the other hand, the best known protocol using incoherent (single-copy) measurements uses $O\left(d^{3} / \varepsilon^{2}\right)$ copies [24], and multiple papers have posed it as an open question to understand whether or not this rate is tight [6], [18]. In this work, we fully resolve this question, by showing that any protocol using incoherent measurements, even if they are chosen adaptively, requires $\Omega\left(d^{3} / \varepsilon^{2}\right)$ copies, matching the upper bound of [24]. We do so by a new proof technique which directly bounds the “tilt” of the posterior distribution after measurements, which yields a surprisingly short proof of our lower bound, and which we believe may be of independent interest. While this implies that adaptivity does not help for tomography with respect to trace distance, we show that it actually does help for tomography with respect to infidelity. We give an adaptive algorithm that outputs a state which is $\gamma$-close in infidelity to $\rho$ using only $\widetilde{O}\left(d^{3} / \gamma\right)$ copies, which is optimal for incoherent measurements. In contrast, it is known [18] that any nonadaptive algorithm requires $\Omega\left(d^{3} / \gamma^{2}\right)$ copies. While it is folklore that in 2 dimensions, one can achieve a scaling of $O(1 / \gamma)$, to the best of our knowledge, our algorithm is the first to achieve the optimal rate in all dimensions.

Details

STOC Conference 2022 Conference Paper

Clustering mixtures with almost optimal separation in polynomial time

Allen Liu
Jerry Li 0001

Details

FOCS Conference 2022 Conference Paper

Minimax Rates for Robust Community Detection

Allen Liu
Ankur Moitra

In this work, we study the problem of community detection in the stochastic block model with adversarial node corruptions. Our main result is an efficient algorithm that can tolerate an $\epsilon$-fraction of corruptions and achieves error $O(\epsilon)+e^{-\frac{C}{2}(1\pm o(1))}$ where $C=(\sqrt{a}-\sqrt{b})^{2}$ is the signal-to-noise ratio and $a/n$ and $b/n$ are the inter-community and intra-community connection probabilities respectively. These bounds essentially match the minimax rates for the SBM without corruptions. We also give robust algorithms for $\mathbb{Z}_{2}$-synchronization. At the heart of our algorithm is a new semidefinite program that uses global information to robustly boost the accuracy of a rough clustering. Moreover, we show that our algorithms are doubly-robust in the sense that they work in an even more challenging noise model that mixes adversarial corruptions with unbounded monotone changes, from the semi-random model.

Details

NeurIPS Conference 2022 Conference Paper

Robust Model Selection and Nearly-Proper Learning for GMMs

Allen Liu
Jerry Li
Ankur Moitra

In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance? The problem of estimating the number of components, also called model selection, is important in its own right but there are essentially no known efficient algorithms with provable guarantees. In this work, we study the problem of model selection for univariate Gaussian mixture models (GMMs). Given $\textsf{poly}(k/\epsilon)$ samples from a distribution that is $\epsilon$-close in TV distance to a GMM with $k$ components, we can construct a GMM with $\widetilde{O}(k)$ components that approximates the distribution to within $\widetilde{O}(\epsilon)$ in $\textsf{poly}(k/\epsilon)$ time. Thus we are able to approximately determine the minimum number of components needed to fit the distribution within a logarithmic factor. Moreover, by adapting the techniques we obtain similar results for reconstructing Fourier-sparse signals. Prior to our work, the only known algorithms for learning arbitrary univariate GMMs either output significantly more than $k$ components (e. g. $k/\epsilon^2$ components for kernel density estimates) or run in time exponential in $k$.

PDF Details

FOCS Conference 2022 Conference Paper

Tight Bounds for Quantum State Certification with Incoherent Measurements

Sitan Chen
Jerry Li 0001
Brice Huang
Allen Liu

We consider the problem of quantum state certification, where we are given the description of a mixed state $\sigma\in\mathbb{C}^{d\times d}, n$ copies of a mixed state $\rho\in\mathbb{C}^{d\times d}$, and $\varepsilon\gt 0$, and we are asked to determine whether $\rho=\sigma$ or whether $\|\rho-\sigma\|_{1}\gt \varepsilon$. When $\sigma$ is the maximally mixed state $\frac{1}{d}I_{d}$, this is known as mixedness testing. We focus on algorithms which use incoherent measurements, i. e. which only measure one copy of $\rho$ at a time. Unlike those that use entangled, multi-copy measurements, these can be implemented without persistent quantum memory and thus represent a large class of protocols that can be run on current or near-term devices. For mixedness testing, there is a folklore algorithm which uses incoherent measurements and only needs $O(d^{3/2}/\varepsilon^{2})$ copies. The algorithm is non-adaptive, that is, its measurements are fixed ahead of time, and is known to be optimal for non-adaptive algorithms. However, when the algorithm can make arbitrary incoherent measurements, the best known lower bound is only $\Omega(d^{4/3}/\varepsilon^{2})$ [5], and it has been an outstanding open problem to close this polynomial gap. In this work: •We settle the copy complexity of mixedness testing with incoherent measurements and show that $\Omega(d^{3/2}/\varepsilon^{2})$ copies are necessary. This fully resolves open questions of [15] and [5]. •We show that the instance-optimal bounds for state certification to general $\sigma$ first derived in [7] for non-adaptive measurements also hold for arbitrary incoherent measurements. Qualitatively, our results say that adaptivity does not help at all for these problems. Our results are based on new techniques that allow us to reduce the problem to understanding the concentration of certain matrix martingales, which we believe may be of independent interest.

Details

NeurIPS Conference 2021 Conference Paper

Margin-Independent Online Multiclass Learning via Convex Geometry

Guru Guruganesh
Allen Liu
Jon Schneider
Joshua Wang

We consider the problem of multi-class classification, where a stream of adversarially chosen queries arrive and must be assigned a label online. Unlike traditional bounds which seek to minimize the misclassification rate, we minimize the total distance from each query to the region corresponding to its assigned label. When the true labels are determined via a nearest neighbor partition -- i. e. the label of a point is given by which of $k$ centers it is closest to in Euclidean distance -- we show that one can achieve a loss that is independent of the total number of queries. We complement this result by showing that learning general convex sets requires an almost linear loss per query. Our results build off of regret guarantees for the problem of contextual search. In addition, we develop a novel reduction technique from multiclass classification to binary classification which may be of independent interest.

PDF Details

SODA Conference 2021 Conference Paper

Optimal Contextual Pricing and Extensions

Allen Liu
Renato Paes Leme
Jon Schneider

In the contextual pricing problem a seller repeatedly obtains products described by an adversarially chosen feature vector in ℝ d and only observes the purchasing decisions of a buyer with a fixed but unknown linear valuation over the products. The regret measures the difference between the revenue the seller could have obtained knowing the buyer valuation and what can be obtained by the learning algorithm. We give a poly-time algorithm for contextual pricing with O ( d log log T + d log d ) regret which matches the Ω( d log log T ) lower bound up to the d log d additive factor. If we replace pricing loss by the symmetric loss, we obtain an algorithm with nearly optimal regret of O ( d log d ) matching the Ω( d ) lower bound up to log d. These algorithms are based on a novel technique of bounding the value of the Steiner polynomial of a convex region at various scales. The Steiner polynomial is a degree d polynomial with intrinsic volumes as the coefficients. We also study a generalized version of contextual search where the hidden linear function over the Euclidean space is replaced by a hidden function f: → in a certain hypothesis class ℋ. We provide a generic algorithm with O ( d 2 ) regret where d is the covering dimension of this class. This leads in particular to a Õ ( s 2 ) regret algorithm for linear contextual search if the linear function is guaranteed to be s -sparse. Finally we also extend our results to the noisy feedback model, where each round our feedback is flipped with a fixed probability p < 1/2.

Details

STOC Conference 2021 Conference Paper

Settling the robust learnability of mixtures of Gaussians

Allen Liu
Ankur Moitra

Details

NeurIPS Conference 2020 Conference Paper

Myersonian Regression

Allen Liu
Renato Leme
Jon Schneider

Motivated by pricing applications in online advertising, we study a variant of linear regression with a discontinuous loss function that we term Myersonian regression. In this variant, we wish to find a linear function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ that well approximates a set of points $(x_i, v_i) \in \mathbb{R}^d \times [0, 1]$ in the following sense: we receive a loss of $v_i$ when $f(x_i) > v_i$ and a loss of $v_i - f(x_i)$ when $f(x_i) \leq v_i$. This arises naturally in the economic application of designing a pricing policy for differentiated items (where the loss is the gap between the performance of our policy and the optimal Myerson prices). We show that Myersonian regression is NP-hard to solve exactly and furthermore that no fully polynomial-time approximation scheme exists for Myersonian regression conditioned on the Exponential Time Hypothesis being true. In contrast to this, we demonstrate a polynomial-time approximation scheme for Myersonian regression that obtains an $\epsilon m$ additive approximation to the optimal possible revenue and can be computed in time $O(\exp(\mathrm{poly}(1/\epsilon))\poly(m, n))$. We show that this algorithm is stable and generalizes well over distributions of samples.

PDF Details

NeurIPS Conference 2020 Conference Paper

Tensor Completion Made Practical

Allen Liu
Ankur Moitra

Tensor completion is a natural higher-order generalization of matrix completion where the goal is to recover a low-rank tensor from sparse observations of its entries. Existing algorithms are either heuristic without provable guarantees, based on solving large semidefinite programs which are impractical to run, or make strong assumptions such as requiring the factors to be nearly orthogonal. In this paper we introduce a new variant of alternating minimization, which in turn is inspired by understanding how the progress measures that guide convergence of alternating minimization in the matrix setting need to be adapted to the tensor setting. We show strong provable guarantees, including showing that our algorithm converges linearly to the true tensors even when the factors are highly correlated and can be implemented in nearly linear time. Moreover our algorithm is also highly practical and we show that we can complete third order tensors with a thousand dimensions from observing a tiny fraction of its entries. In contrast, and somewhat surprisingly, we show that the standard version of alternating minimization, without our new twist, can converge at a drastically slower rate in practice.

PDF Details

FOCS Conference 2018 Conference Paper

Efficiently Learning Mixtures of Mallows Models

Allen Liu
Ankur Moitra

Mixtures of Mallows models are a popular generative model for ranking data coming from a heterogeneous population. They have a variety of applications including social choice, recommendation systems and natural language processing. Here we give the first polynomial time algorithm for provably learning the parameters of a mixture of Mallows models with any constant number of components. Prior to our work, only the two component case had been settled. Our analysis revolves around a determinantal identity of Zagier which was proven in the context of mathematical physics, which we use to show polynomial identifiability and ultimately to construct test functions to peel off one component at a time. To complement our upper bounds, we show information-theoretic lower bounds on the sample complexity as well as lower bounds against restricted families of algorithms that make only local queries. Together, these results demonstrate various impediments to improving the dependence on the number of components. They also motivate the study of learning mixtures of Mallows models from the perspective of beyond worst-case analysis. In this direction, we show that when the scaling parameters of the Mallows models have separation, there are much faster learning algorithms.

Details