Arrow Research search

Author name cluster

Aravindan Vijayaraghavan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers
2 author rows

Possible papers

30

NeurIPS Conference 2025 Conference Paper

Guarantees for Alternating Least Squares in Overparameterized Tensor Decompositions

  • Dionysis Arvanitakis
  • Vaidehi Srinivas
  • Aravindan Vijayaraghavan

Tensor decomposition is a canonical non-convex optimization problem that is computationally challenging, and yet important due to applications in factor analysis and parameter estimation of latent variable models. In practice, scalable iterative methods, particularly Alternating Least Squares (ALS), remain the workhorse for tensor decomposition despite the lack of global convergence guarantees. A popular approach to tackle challenging non-convex optimization problems is overparameterization--- on input an $n \times n \times n$ tensor of rank $r$, the algorithm can output a decomposition of potentially rank $k$ (potentially larger than $r$). On the theoretical side, overparameterization for iterative methods is challenging to reason about and requires new techniques. The work of Wang et al. , (NeurIPS 2020) makes progress by showing that a variant of gradient descent globally converges when overparameterized to $k=O(r^{7. 5} \log n)$. Our main result shows that overparameterization provably enables global convergence of ALS: on input a third order $n \times n \times n$ tensor with a decomposition of rank $r \ll n$, ALS overparameterized with rank $k=O(r^2)$ achieves global convergence with high probability under random initialization. Moreover our analysis also gives guarantees for the more general low-rank approximation problem. The analysis introduces new techniques for understanding iterative methods in the overparameterized regime based on new matrix anticoncentration arguments.

ICML Conference 2025 Conference Paper

Volume Optimality in Conformal Prediction with Structured Prediction Sets

  • Chao Gao
  • Liren Shan
  • Vaidehi Srinivas
  • Aravindan Vijayaraghavan

Conformal Prediction is a widely studied technique to construct prediction sets of future observations. Most conformal prediction methods focus on achieving the necessary coverage guarantees, but do not provide formal guarantees on the size (volume) of the prediction sets. We first prove the impossibility of volume optimality where any distribution-free method can only find a trivial solution. We then introduce a new notion of volume optimality by restricting the prediction sets to belong to a set family (of finite VC-dimension), specifically a union of $k$-intervals. Our main contribution is an efficient distribution-free algorithm based on dynamic programming (DP) to find a union of $k$-intervals that is guaranteed for any distribution to have near-optimal volume among all unions of $k$-intervals satisfying the desired coverage property. By adopting the framework of distributional conformal prediction (Chernozhukov et al. , 2021), the new DP based conformity score can also be applied to achieve approximate conditional coverage and conditional restricted volume optimality, as long as a reasonable estimator of the conditional CDF is available. While the theoretical results already establish volume-optimality guarantees, they are complemented by experiments that demonstrate that our method can significantly outperform existing methods in many settings.

FOCS Conference 2024 Conference Paper

Efficient Certificates of Anti-Concentration Beyond Gaussians

  • Ainesh Bakshi
  • Pravesh K. Kothari
  • Goutham Rajendran
  • Madhur Tulsiani
  • Aravindan Vijayaraghavan

A set of high dimensional points X $= \{x_{1}, x_{2}, \ldots, x_{n}\}\subseteq \mathbb{R}^{d}$ in isotropic position is said to be $\delta$ -anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $\left|\left\langle x_i, v\right\rangle\right| \leqslant \delta$ is at most $O(\delta)$. Motivated by applications to list-decodable learning and clustering, three recent works [7], [44], [71] considered the problem of constructing efficient certificates of anti-concentration in the average case, when the set of points X corresponds to samples from a Gaussian distribution. Their certificates played a crucial role in several subsequent works in algorithmic robust statistics on list-decodable learning and settling the robust learnability of arbitrary Gaussian mixtures. Unlike related efficient certificates of concentration properties that are known for wide class of distri-butions [52], the aforementioned approach has been limited only to rotationally invariant distributions (and their affine transformations) with the only prominent example being Gaussian distributions. This work presents a new (and arguably the most natural) formulation for anti- concentration. Using this formulation, we give quasi-polynomial time verifiable sum-of-squares certificates of anti-concentration that hold for a wide class of non-Gaussian distributions including anti-concentrated bounded product distributions and uniform distributions over $L_{p}$ balls (and their affine transformations). Consequently, our method upgrades and extends results in algorithmic robust statistics e. g. , list-decodable learning and clustering, to such distributions. As in the case of previous works, our certificates are also obtained via relaxations in the sum-of-squares hierarchy. However, the nature of our argument differs significantly from prior works that formulate anti-concentration as the non-negativity of an explicit polynomial. Our argument constructs a canonical integer program for anti-concentration and analysis a SoS relaxation of it, independent of the intended application. The explicit polynomials appearing in prior works can be seen as specific dual certificates to this program. From a technical standpoint, unlike existing works that explicitly construct sum-of-squares certificates, our argument relies on duality and analyzes a pseudo-expectation on large subsets of the input points that take a small value in some direction. Our analysis uses the method of polynomial reweightings to reduce the problem to analyzing only analytically dense or sparse directions.

SODA Conference 2024 Conference Paper

Higher-Order Cheeger Inequality for Partitioning with Buffers

  • Konstantin Makarychev
  • Yury Makarychev
  • Liren Shan
  • Aravindan Vijayaraghavan

We prove a new generalization of the higher-order Cheeger inequality for partitioning with buffers. Consider a graph G = ( V, E ). The buffered expansion of a set S ⊆ V with a buffer B ⊆ V \ S is the edge expansion of S after removing all the edges from set S to its buffer B. An ɛ -buffered k -partitioning is a partitioning of a graph into disjoint components P i and buffers B i, in which the size of buffer B i for P i is small relative to the size of P i: | B i | ≤ ɛ| P i |. The buffered expansion of a buffered partition is the maximum of buffered expansions of the k sets P i with buffers B i. Let be the buffered expansion of the optimal ɛ -buffered k -partitioning, then for every δ > 0, where λ [(1 + δ) k is the [(1 + δ) k -th smallest eigenvalue of the normalized Laplacian of G. Our inequality is constructive and avoids the “square-root loss” that is present in the standard Cheeger inequalities (even for k = 2). We also provide a complementary lower bound, and a novel generalization to the setting with arbitrary vertex weights and edge costs. Moreover our result implies and generalizes the standard higher-order Cheeger inequalities and another recent Cheeger-type inequality by Kwok, Lau, and Lee (2017) involving robust vertex expansion.

NeurIPS Conference 2024 Conference Paper

Theoretical Analysis of Weak-to-Strong Generalization

  • Hunter Lang
  • David Sontag
  • Aravindan Vijayaraghavan

Strong student models can learn from weaker teachers: when trained on the predictions of a weaker model, a strong pretrained student can learn to correct the weak model’s errors and generalize to examples where the teacher is not confident, even when these examples are excluded from training. This enables learning from cheap, incomplete, and possibly incorrect label information, such as coarse logical rules or the generations of a language model. We show that existing weak supervision theory results fail to account for both of these effects, which we call pseudolabel correction and coverage expansion, respectively. We give a new bound based on expansion properties of the data distribution and student hypothesis class that directly accounts for pseudolabel correction and coverage expansion. Our bound generalizes results from the co-training and self-training literature and captures the intuition that weak-to-strong generalization occurs when the mistakes of the weak model are hard for the strong model to fit without incurring additional error. We show that these expansion properties can be checked from finite data and give empirical evidence that they hold in practice.

ICLR Conference 2023 Conference Paper

Agnostic Learning of General ReLU Activation Using Gradient Descent

  • Pranjal Awasthi
  • Alex Tang
  • Aravindan Vijayaraghavan

We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves an error that is within a constant factor of the optimal i.e., it is guaranteed to achieve an error of $O(OPT)$, where $OPT$ is the error of the best ReLU function. This is a significant improvement over existing guarantees for gradient descent, which only guarantee error of $O(\sqrt{d \cdot OPT})$ even in the zero-bias case (Frei et al., 2020). We also provide finite sample guarantees, and obtain similar guarantees for a broader class of marginal distributions beyond Gaussians.

FOCS Conference 2023 Conference Paper

Computing linear sections of varieties: quantum entanglement, tensor decompositions and beyond

  • Nathaniel Johnston
  • Benjamin Lovitz
  • Aravindan Vijayaraghavan

We study the problem of finding elements in the intersection of an arbitrary conic variety in $\mathbb{F}^{n}$ with a given linear subspace (where $\mathbb{F}$ can be the real or complex field). This problem captures a rich family of algorithmic problems under different choices of the variety. The special case of the variety consisting of rank-1 matrices already has strong connections to central problems in different areas like quantum information theory and tensor decompositions. This problem is known to be NP-hard in the worst case, even for the variety of rank-1 matrices. In this work, we propose and analyze an algorithm for solving this problem. Surprisingly, despite the above hardness results we show that our algorithm solves this problem efficiently for “typical” subspaces. Here, the subspace $\mathcal{U} \subseteq \mathbb{F}^{n}$ is chosen generically of a certain dimension, potentially with some generic elements of the variety contained in it. Our main result is a guarantee that our algorithm recovers all the elements of $\mathcal{U}$ that lie in the variety, under some mild non-degeneracy assumptions on the variety. As corollaries, we obtain the following new results: •Polynomial time algorithms for several entangled subspaces problems in quantum entanglement, including determining r-entanglement, complete entanglement, and genuine entanglement of a subspace. While all of these problems are NP-hard in the worst case, our algorithm solves them in polynomial time for generic subspaces of dimension up to a constant multiple of the maximum possible. •Uniqueness results and polynomial time algorithmic guarantees for generic instances of a broad class of low-rank decomposition problems that go beyond tensor decompositions. Here, we recover a decomposition of the form $\sum_{i=1}^{R} v_{i} \otimes w_{i}$, where the $v_{i}$ are elements of the given variety $\mathcal{X}$. This implies new uniqueness results and genericity guarantees even in the special case of tensor decompositions.

NeurIPS Conference 2022 Conference Paper

The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound

  • Liam O'Carroll
  • Vaidehi Srinivas
  • Aravindan Vijayaraghavan

The most widely used technique for solving large-scale semidefinite programs (SDPs) in practice is the non-convex Burer-Monteiro method, which explicitly maintains a low-rank SDP solution for memory efficiency. There has been much recent interest in obtaining a better theoretical understanding of the Burer-Monteiro method. When the maximum allowed rank $p$ of the SDP solution is above the Barvinok-Pataki bound (where a globally optimal solution of rank at most \(p\) is guaranteed to exist), a recent line of work established convergence to a global optimum for generic or smoothed instances of the problem. However, it was open whether there even exists an instance in this regime where the Burer-Monteiro method fails. We prove that the Burer-Monteiro method can fail for the Max-Cut SDP on $n$ vertices when the rank is above the Barvinok-Pataki bound ($p \ge \sqrt{2n}$). We provide a family of instances that have spurious local minima even when the rank $p = n/2$. Combined with existing guarantees, this settles the question of the existence of spurious local minima for the Max-Cut formulation in all ranges of the rank and justifies the use of beyond worst-case paradigms like smoothed analysis to obtain guarantees for the Burer-Monteiro method.

NeurIPS Conference 2022 Conference Paper

Training Subset Selection for Weak Supervision

  • Hunter Lang
  • Aravindan Vijayaraghavan
  • David Sontag

Existing weak supervision approaches use all the data covered by weak signals to train a classifier. We show both theoretically and empirically that this is not always optimal. Intuitively, there is a tradeoff between the amount of weakly-labeled data and the precision of the weak labels. We explore this tradeoff by combining pretrained data representations with the cut statistic to select (hopefully) high-quality subsets of the weakly-labeled training data. Subset selection applies to any label model and classifier and is very simple to plug in to existing weak supervision pipelines, requiring just a few lines of code. We show our subset selection method improves the performance of weak supervision for a wide range of label models, classifiers, and datasets. Using less weakly-labeled data improves the accuracy of weak supervision pipelines by up to 19% (absolute) on benchmark tasks.

NeurIPS Conference 2021 Conference Paper

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations

  • Pranjal Awasthi
  • Alex Tang
  • Aravindan Vijayaraghavan

We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{\mathsf{T}}\sigma({W}^\mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $\sigma(t) = \max(t, 0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias ($b$) is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f(x)$. Using these ideas we also establish identifiability of the network parameters under very mild assumptions.

ICML Conference 2021 Conference Paper

Graph Cuts Always Find a Global Optimum for Potts Models (With a Catch)

  • Hunter Lang
  • David A. Sontag
  • Aravindan Vijayaraghavan

We prove that the alpha-expansion algorithm for MAP inference always returns a globally optimal assignment for Markov Random Fields with Potts pairwise potentials, with a catch: the returned assignment is only guaranteed to be optimal for an instance within a small perturbation of the original problem instance. In other words, all local minima with respect to expansion moves are global minima to slightly perturbed versions of the problem. On "real-world" instances, MAP assignments of small perturbations of the problem should be very similar to the MAP assignment(s) of the original problem instance. We design an algorithm that can certify whether this is the case in practice. On several MAP inference problem instances from computer vision, this algorithm certifies that MAP solutions to all of these perturbations are very close to solutions of the original instance. These results taken together give a cohesive explanation for the good performance of "graph cuts" algorithms in practice. Every local expansion minimum is a global minimum in a small perturbation of the problem, and all of these global minima are close to the original solution.

NeurIPS Conference 2020 Conference Paper

Adversarial robustness via robust low rank representations

  • Pranjal Awasthi
  • Himanshu Jain
  • Ankit Singh Rawat
  • Aravindan Vijayaraghavan

Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees. Our first contribution is for certified robustness to perturbations measured in L_2 norm. We exploit low rank data representations to provide improved guarantees over state-of-the-art randomized smoothing-based approaches on standard benchmark datasets such as CIFAR-10 and CIFAR-100. Our second contribution is for the more challenging setting of certified robustness to perturbations measured in L_\infty norm. We demonstrate empirically that natural low rank representations have inherent robustness properties that can be leveraged to provide significantly better guarantees for certified robustness to L_\infty perturbations in those representations. Our certificate of L_\infty robustness relies on a natural quantity involving the \infty -> 2 matrix operator norm associated with the representation, to translate robustness guarantees from L 2 to L \infty perturbations. A key technical ingredient for our certification guarantees is a fast algorithm with provable guarantees based on the multiplicative weights update method to provide upper bounds on the above matrix norm. Our algorithmic guarantees improve upon the state of the art for this problem, and may be of independent interest.

FOCS Conference 2020 Conference Paper

Scheduling Precedence-Constrained Jobs on Related Machines with Communication Delay

  • Biswaroop Maiti
  • Rajmohan Rajaraman
  • David Stalfa
  • Zoya Svitkina
  • Aravindan Vijayaraghavan

We consider the problem of scheduling precedence-constrained jobs on uniformly-related machines in the presence of an arbitrary, fixed communication delay. Communication delay is the amount of time that must pass between the completion of a job on one machine and the start of any successor of that job on a different machine. We consider a model that allows job duplication, i. e. processing of the same job on multiple machines, which, as we show, can reduce the length of a schedule (i. e. , its makespan) by a logarithmic factor. Our main result is an approximation algorithm for makespan with approximation ratio polylogarithmic in the number of machines and the length of the communication delay, assuming the minimum makespan is at least the delay. Our algorithm is based on rounding a linear programming relaxation for the problem, which includes carefully designed constraints capturing the interaction among communication delay, precedence requirements, varying speeds, and job duplication. To derive a schedule from a solution to the linear program, we balance the benefits of duplication in satisfying precedence constraints early against its drawbacks in increasing overall system load. Our result builds on two previous lines of work, one with communication delay but identical machines (Lepere, Rapine 2002), and the other with uniformly-related machines but no communication delay (Chudak, Shmoys 1999). We next show that the integrality gap of our mathematical program is polylogarithmic in the communication delay. Our gap construction employs expander graphs and exploits a property of robust expansion and its generalization to paths of longer length, which may be of independent interest. Finally, we quantify the advantage of duplication in scheduling with communication delay. We show that the best schedule without duplication can have a larger makespan than the optimal with duplication by a logarithmic factor. Nevertheless, we present a polynomial time algorithm to transform any schedule to a schedule without duplication at the cost of an increase in makespan polylogarithmic in the number of jobs and machines. Together with our makespan approximation algorithm for schedules allowing duplication, this also yields a polylogarithmic-approximation algorithm for the setting where duplication is not allowed.

NeurIPS Conference 2019 Conference Paper

On Robustness to Adversarial Examples and Polynomial Optimization

  • Pranjal Awasthi
  • Abhratanu Dutta
  • Aravindan Vijayaraghavan

We study the design of computationally efficient algorithms with provable guarantees, that are robust to adversarial (test time) perturbations. While there has been an explosion of recent work on this topic due to its connections to test time robustness of deep networks, there is limited theoretical understanding of several basic questions like (i) when and how can one design provably robust learning algorithms? (ii) what is the price of achieving robustness to adversarial examples in a computationally efficient manner? The main contribution of this work is to exhibit a strong connection between achieving robustness to adversarial examples, and a rich class of polynomial optimization problems, thereby making progress on the above questions. In particular, we leverage this connection to (a) design computationally efficient robust algorithms with provable guarantees for a large class of hypothesis, namely linear classifiers and degree-2 polynomial threshold functions~(PTFs), (b) give a precise characterization of the price of achieving robustness in a computationally efficient manner for these classes, (c) design efficient algorithms to certify robustness and generate adversarial attacks in a principled manner for 2-layer neural networks. We empirically demonstrate the effectiveness of these attacks on real data.

FOCS Conference 2019 Conference Paper

Smoothed Analysis in Unsupervised Learning via Decoupling

  • Aditya Bhaskara
  • Aidao Chen
  • Aidan Perreault
  • Aravindan Vijayaraghavan

Smoothed analysis is a powerful paradigm in overcoming worst-case intractability in unsupervised learning and high-dimensional data analysis. While polynomial time smoothed analysis guarantees have been obtained for worst-case intractable problems like tensor decompositions and learning mixtures of Gaussians, such guarantees have been hard to obtain for several other important problems in unsupervised learning. A core technical challenge in analyzing algorithms is obtaining lower bounds on the least singular value for random matrix ensembles with dependent entries, that are given by low-degree polynomials of a few base underlying random variables. In this work, we address this challenge by obtaining high-confidence lower bounds on the least singular value of new classes of structured random matrix ensembles of the above kind. We then use these bounds to design algorithms with polynomial time smoothed analysis guarantees for the following three important problems in unsupervised learning: (1) Robust subspace recovery, when the fraction of inliers in the d-dimensional subspace T of the n-dimensional Euclidean space is at least (d/n) t for any positive integer t. This contrasts with the known worst-case intractability when the fraction of inliers is at most d/n, and the previous smoothed analysis result (Hardt and Moitra, 2013). (2) Learning overcomplete hidden markov models, where the size of the state space is any polynomial in the dimension of the observations. This gives the first polynomial time guarantees for learning overcomplete HMMs in the smoothed analysis model. (3) Higher order tensor decompositions, where we generalize and analyze the so-called FOOBI algorithm of Cardoso to find order-t rank-one tensors in a subspace. This gives polynomially robust decomposition algorithms for order-2t tensors with rank n t.

ICML Conference 2018 Conference Paper

Clustering Semi-Random Mixtures of Gaussians

  • Pranjal Awasthi
  • Aravindan Vijayaraghavan

Gaussian mixture models (GMM) are the most widely used statistical model for the k-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural robust model for k-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd’s algorithm for k-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any k-means clustering algorithm on the semi-random model.

FOCS Conference 2018 Conference Paper

Towards Learning Sparsely Used Dictionaries with Arbitrary Supports

  • Pranjal Awasthi
  • Aravindan Vijayaraghavan

Dictionary learning is a popular approach for inferring a hidden basis in which data has a sparse representation. There is a hidden dictionary or basis A which is an n × m matrix, with m > n typically (this is called the over-complete setting). Data generated from the dictionary is given by Y = AX where X is a matrix whose columns have supports chosen from a distribution over k-sparse vectors, and the non-zero values chosen from a symmetric distribution. Given Y, the goal is to recover A and X in polynomial time (in m, n). Existing algorithms give polynomial time guarantees for recovering incoherent dictionaries, under strong distributional assumptions both on the supports of the columns of X, and on the values of the non-zero entries. In this work, we study the following question: can we design efficient algorithms for recovering dictionaries when the supports of the columns of X are arbitrary? To address this question while circumventing the issue of non-identifiability, we study a natural semirandom model for dictionary learning. In this model, there are a large number of samples y = Ax with arbitrary k-sparse supports for x, along with a few samples where the sparse supports are chosen uniformly at random. While the presence of a few samples with random supports ensures identifiability, the support distribution can look almost arbitrary in aggregate. Hence, existing algorithmic techniques seem to break down as they make strong assumptions on the supports. Our main contribution is a new polynomial time algorithm for learning incoherent over-complete dictionaries that provably works under the semirandom model. Additionally the same algorithm provides polynomial time guarantees in new parameter regimes when the supports are fully random. Finally, as a by product of our techniques, we also identify a minimal set of conditions on the supports under which the dictionary can be (information theoretically) recovered from polynomially many samples for almost linear sparsity, i. e. , k = Õ(n).

SODA Conference 2017 Conference Paper

Approximation Algorithms for Label Cover and The Log-Density Threshold

  • Eden Chlamtac
  • Pasin Manurangsi
  • Dana Moshkovitz
  • Aravindan Vijayaraghavan

Many known optimal NP-hardness of approximation results are reductions from a problem called Label Cover. The input is a bipartite graph G = ( L, R, E ) and each edge e = ( x, y ) ∊ E carries a projection π ∊ that maps labels to x to labels to y. The objective is to find a labeling of the vertices that satisfies as many of the projections as possible. It is believed that the best approximation ratio efficiently achievable for La bel -Co ver is of the form N −c where N = nk, n is the number of vertices, k is the number of labels, and 0 ≤ c < 1 is some constant. Inspired by a framework originally developed for Densest k -SuBGRAPH, we propose a “log density threshold” for the approximability of Label-Cover. Specifically, we suggest the possibility that the Label-Cover approximation problem undergoes a computational phase transition at the same threshold at which local algorithms for its random counterpart fail. This threshold is We then design, for any ∊ > 0, a polynomial-time approximation algorithm for semirandom Label-Cover whose approximation ratio is In our semi-random model, the input graph is random (or even just expanding), and the projections on the edges are arbitrary. For worst-case La bel -Co ver we show a polynomial- time algorithm whose approximation ratio is roughly Ν-°· 2 33. The previous best efficient approximation ratio was Ν −0 · 25. We present some evidence towards an Ν −c threshold by constructing integrality gaps for Ν ω(1) rounds of the Sum-of-squares/Lasserre hierarchy of the natural relaxation of Label Cover. For general 2CSP the “log density threshold” is Ν −0 · 25, and we give a polynomial-time algorithm in the semi-random model whose approximation ratio is Ν −0 · 25 + ∊ for any ∊ > 0.

NeurIPS Conference 2017 Conference Paper

Clustering Stable Instances of Euclidean k-means.

  • Aravindan Vijayaraghavan
  • Abhratanu Dutta
  • Alex Wang

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.

FOCS Conference 2017 Conference Paper

On Learning Mixtures of Well-Separated Gaussians

  • Oded Regev 0001
  • Aravindan Vijayaraghavan

We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of k standard spherical Gaussians with means μ 1, .. ., μ k ∈ ℝ d, and the goal is to estimate the means up to accuracy δ using poly(k, d, 1/δ) samples. In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly min{k, d}1/4. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation o(1), exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results. ; We show that with separation o(√(log k)), superpolynomially many samples are required. In fact, this holds even when the k means of the Gaussians are picked at random in d = O(log k) dimensions. ; We show that with separation Ω(√(log k)), picked at random in d = O(log k) dimensions. poly(k, d, 1/δ) samples suffice. Notice that the bound on the separation is independent of δ. This result is based on a new and efficient “accuracy boosting” algorithm that takes as input coarse estimates of the true means and in time (and samples) poly(k, d, 1/δ) outputs estimates of the means up to arbitrarily good accuracy δ assuming the separation between the means is Ω(min{√(log k), √d}) (independently of δ). The idea of the algorithm is to iteratively solve a “diagonally dominant” system of non-linear equations. We also (1) present a computationally efficient algorithm in d = O(1) dimensions with only Ω(√d) separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of k spherical Gaussians with polynomial samples.

STOC Conference 2014 Conference Paper

Constant factor approximation for balanced cut in the PIE model

  • Konstantin Makarychev
  • Yury Makarychev
  • Aravindan Vijayaraghavan

We propose and study a new semi-random semi-adversarial model for Balanced Cut, a planted model with permutation invariant random edges (PIE). Our model is much more general than planted models considered previously. Consider a set of vertices V partitioned into two clusters L and R of equal size. Let G be an arbitrary graph on V with no edges between L and R . Let E random be a set of edges sampled from an arbitrary permutation-invariant distribution (a distribution that is invariant under permutation of vertices in L and in R ). Then we say that G+E random is a graph with permutation-invariant random edges.

NeurIPS Conference 2014 Conference Paper

Learning Mixtures of Ranking Models

  • Pranjal Awasthi
  • Avrim Blum
  • Or Sheffet
  • Aravindan Vijayaraghavan

This work concerns learning probabilistic models for ranking data in a heterogeneous population. The specific problem we study is learning the parameters of a {\em Mallows Mixture Model}. Despite being widely studied, current heuristics for this problem do not have theoretical guarantees and can get stuck in bad local optima. We present the first polynomial time algorithm which provably learns the parameters of a mixture of two Mallows models. A key component of our algorithm is a novel use of tensor decomposition techniques to learn the top-$k$ prefix in both the rankings. Before this work, even the question of {\em identifiability} in the case of a mixture of two Mallows models was unresolved.

SODA Conference 2012 Conference Paper

Polynomial integrality gaps for strong SDP relaxations of Densest k -subgraph

  • Aditya Bhaskara
  • Moses Charikar
  • Aravindan Vijayaraghavan
  • Venkatesan Guruswami
  • Yuan Zhou 0007

The Densest k -subgraph problem (i. e. find a size k subgraph with maximum number of edges), is one of the notorious problems in approximation algorithms. There is a significant gap between known upper and lower bounds for Densest k -subgraph: the current best algorithm gives an ≈ O ( n 1/4 ) approximation, while even showing a small constant factor hardness requires significantly stronger assumptions than P ≠ NP. In addition to interest in designing better algorithms, a number of recent results have exploited the conjectured hardness of Densest k -subgraph and its variants. Thus, understanding the approximability of Densest k -subgraph is an important challenge. In this work, we give evidence for the hardness of approximating Densest k -subgraph within polynomial factors. Specifically we expose the limitations of strong semidefinite programs from SDP hierarchies in solving Densest k -subgraph. Our results include: • A lower bound of Ω( n 1/4 /log 3 n ) on the integrality gap for Ω(log n / log log n ) rounds of the Sherali-Adams relaxation for Densest k -subgraph. This also holds for the relaxation obtained from Sherali-Adams with an added SDP constraint. Our gap instances are in fact Erdös-Renyi random graphs. • For every ∊ > 0, a lower bound of n 2/53 − ∊ on the integrality gap of n Ω(∊) rounds of the Lasserre SDP relaxation for Densest k -subgraph, and an n Ω ∊ (1) gap for n 1−∊ rounds. Our construction proceeds via a reduction from random instances of a certain Max-CSP over large domains. In the absence of inapproximability results for Densest k -subgraph, our results show that beating a factor of n Ω(1) is a barrier for even the most powerful SDPs, and in fact even beating the best known n 1/4 factor is a barrier for current techniques. Our results indicate that approximating Densest k -subgraph within a polynomial factor might be a harder problem than Unique Games or Small Set Expansion, since these problems were recently shown to be solvable using n ∊ ω(1) rounds of the Lasserre hierarchy where ∊ is the completeness parameter in Unique Games and Small Set Expansion.

SODA Conference 2011 Conference Paper

Approximating Matrix p-norms

  • Aditya Bhaskara
  • Aravindan Vijayaraghavan

We consider the problem of computing the q → p norm of a matrix A, which is defined for p, q ≥ 1, as This is in general a non-convex optimization problem, and is a natural generalization of the well-studied question of computing singular values (this corresponds to p = q = 2). Different settings of parameters give rise to a variety of known interesting problems (such as the Grothendieck problem when p = 1 and q = ∞). However, very little is understood about the approximability of the problem for different values of p, q. Our first result is an efficient algorithm for computing the q → p norm of matrices with non-negative entries, when q ≥ p ≥ 1. The algorithm we analyze is based on a natural fixed point iteration, which can be seen as an analog of power iteration for computing eigenvalues. We then present an application of our techniques to the problem of constructing a scheme for oblivious routing in the ℓ p norm. This makes constructive a recent existential result of Englert and Räcke [ER09] on O (log n ) competitive oblivious routing schemes (which they make constructive only for p = 2). On the other hand, when we do not have any restrictions on the entries (such as non-negativity), we prove that the problem is NP-hard to approximate to any constant factor, for 2 < p ≤ q and p ≤ q < 2 (these are precisely the ranges of p, q with p ≤ q where constant factor approximations are not known). In this range, our techniques also show that if NP ∉ DTIME( n polylog( n ) ), the problem cannot be approximated to a factor 2 (log n ) 1–ε, for any constant ε > 0.

STOC Conference 2010 Conference Paper

Detecting high log-densities: an O ( n 1/4 ) approximation for densest k -subgraph

  • Aditya Bhaskara
  • Moses Charikar
  • Eden Chlamtac
  • Uriel Feige
  • Aravindan Vijayaraghavan

In the Densest k-Subgraph problem, given a graph G and a parameter k, one needs to find a subgraph of G induced on k vertices that contains the largest number of edges. There is a significant gap between the best known upper and lower bounds for this problem. It is NP-hard, and does not have a PTAS unless NP has subexponential time algorithms. On the other hand, the current best known algorithm of Feige, Kortsarz and Peleg, gives an approximation ratio of n 1/3 - c for some fixed c>0 (later estimated at around c= 1/90). We present an algorithm that for every ε> 0 approximates the Densest k-Subgraph problem within a ratio of n ¼ + ε in time n O(1/ε) . If allowed to run for time n O(log n) , the algorithm achieves an approximation ratio of O(n ¼ ). Our algorithm is inspired by studying an average-case version of the problem where the goal is to distinguish random graphs from random graphs with planted dense subgraphs -- the approximation ratio we achieve for the general case matches the "distinguishing ratio" we obtain for this planted problem. At a high level, our algorithms involve cleverly counting appropriately defined trees of constant size in G, and using these counts to identify the vertices of the dense subgraph. We say that a graph G(V,E) has log-density α if its average degree is Θ(|V| α ). The algorithmic core of our result is a procedure to output a k-subgraph of 'nontrivial' density whenever the log-density of the densest k-subgraph is larger than the log-density of the host graph. We outline an extension to our approximation algorithm which achieves an O(n ¼ -ε )-approximation in O(2 n O(ε) ) time. We also show that, for certain parameter ranges, eigenvalue and SDP based techniques can outperform our basic distinguishing algorithm for random instances (in polynomial time), though without improving upon the O(n ¼ ) guarantee overall.