Arrow Research search

Author name cluster

Aditya Bhaskara

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers
2 author rows

Possible papers

31

TMLR Journal 2025 Journal Article

An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning

  • Cen-Jhih Li
  • Aditya Bhaskara

Fine-tuning is an important step in adapting foundation models such as large language models to downstream tasks. To make this step more accessible to users with limited computational budgets, it is crucial to develop fine-tuning methods that are memory and computationally efficient. Sparse Fine-tuning (SpFT) and Low-rank adaptation (LoRA) are two frameworks that have emerged for addressing this problem and have been adopted widely in practice. In this work, we develop a new SpFT framework, based on ideas from neural network pruning. At a high level, we first identify ``important'' neurons/nodes using feature importance metrics from network pruning (specifically, we use the structural pruning method), and then perform fine-tuning by restricting to weights involving these neurons. Experiments on common language tasks show our method improves SpFT’s memory efficiency by 20–50% while matching the accuracy of state-of-the-art methods like LoRA’s variants.

TMLR Journal 2025 Journal Article

Counting Hours, Counting Losses: The Toll of Unpredictable Work Schedules on Financial Security

  • Pegah Nokhiz
  • Aravinda Kanchana Ruwanpathirana
  • Aditya Bhaskara
  • Suresh Venkatasubramanian

Financial instability is a pressing concern in the United States, with drivers that include growing employment disparities and insufficient wages. While research typically focuses on financial aspects such as income inequality in precarious work environments, there is a tendency to overlook the time-related aspect of unstable work schedules. The inability to rely on a consistent work schedule not only leads to burnout and conflicts between work and family life but also results in financial shocks that directly impact workers' income and assets. Unforeseen fluctuations in earnings pose challenges in financial planning, affecting decisions regarding savings and spending, and ultimately undermining individuals' long-term financial stability and well-being. Our objective in this study is to understand how unforeseen fluctuations in earnings exacerbate financial fragility by investigating the extent to which individuals' financial management depends on their ability to anticipate and plan for future events. To answer this question, we present a computational framework to model real-time consumption decisions under income uncertainty, drawing on advances in online planning and reinforcement learning (RL) with lookahead. We introduce a novel online algorithm that enables utility-maximizing agents to dynamically adapt consumption choices in response to financial shocks, leveraging partial deterministic information about future income. This approach forms the basis of our simulation framework, which models how workers manage consumption in the face of variable work schedules and the imperative to avoid financial ruin. Through theoretical analysis, we quantify the utility advantage conferred by varying levels of lookahead. Empirical simulations demonstrate how increased lookahead improves financial utility. That is, with this framework, we demonstrate both theoretically and empirically how a worker's capacity to anticipate schedule changes enhances their long-term utility. Conversely, the inability to predict future events can worsen workers' financial instability. Moreover, our framework enables us to explore policy interventions aimed at mitigating the problem of schedule uncertainty. By modeling both individual behavior and potential policy interventions (e.g., advance scheduling regulations), our framework draws on ideas from machine learning and reinforcement learning to inform economic questions surrounding information access in financial planning.

ICLR Conference 2025 Conference Paper

Descent with Misaligned Gradients and Applications to Hidden Convexity

  • Aditya Bhaskara
  • Ashok Cutkosky
  • Ravi Kumar 0001
  • Manish Purohit

We consider the problem of minimizing a convex objective given access to an oracle that outputs "misaligned" stochastic gradients, where the expected value of the output is guaranteed to be correlated with, but not necessarily equal to the true gradient of the objective. In the case where the misalignment (or bias) of the oracle changes slowly, we obtain an optimization algorithm that achieves the optimum iteration complexity of $\tilde O(\epsilon^{-2})$; for the more general case where the changes need not be slow, we obtain an algorithm with $\tilde O(\epsilon^{-3})$ iteration complexity. As an application of our framework, we consider optimization problems with a "hidden convexity" property, and obtain an algorithm with $O(\epsilon^{-3})$ iteration complexity.

ICML Conference 2024 Conference Paper

Convergence Guarantees for the DeepWalk Embedding on Block Models

  • Christopher Harker
  • Aditya Bhaskara

Graph embeddings have emerged as a powerful tool for understanding the structure of graphs. Unlike classical spectral methods, recent methods such as DeepWalk, Node2Vec, etc. are based on solving nonlinear optimization problems on the graph, using local information obtained by performing random walks. These techniques have empirically been shown to produce “better” embeddings than their classical counterparts. However, due to their reliance on solving a nonconvex optimization problem, obtaining theoretical guarantees on the properties of the solution has remained a challenge, even for simple classes of graphs. In this work, we show convergence properties for the DeepWalk algorithm on graphs obtained from the Stochastic Block Model (SBM). Despite being simplistic, the SBM has proved to be a classic model for analyzing the behavior of algorithms on large graphs. Our results mirror the existing ones for spectral embeddings on SBMs, showing that even in the case of one-dimensional embeddings, the output of the DeepWalk algorithm provably recovers the cluster structure with high probability.

NeurIPS Conference 2024 Conference Paper

On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models

  • Aditya Bhaskara
  • Agastya V. Jha
  • Michael Kapralov
  • Naren S. Manoj
  • Davide Mazzali
  • Weronika Wrzos-Kaminska

In a graph bisection problem, we are given a graph $G$ with two equally-sized unlabeled communities, and the goal is to recover the vertices in these communities. A popular heuristic, known as spectral clustering, is to output an estimated community assignment based on the eigenvector corresponding to the second-smallest eigenvalue of the Laplacian of $G$. Spectral algorithms can be shown to provably recover the cluster structure for graphs generated from probabilistic models, such as the Stochastic Block Model (SBM). However, spectral clustering is known to be non-robust to model mis-specification. Techniques based on semidefinite programming have been shown to be more robust, but they incur significant computational overheads. In this work, we study the robustness of spectral algorithms against semirandom adversaries. Informally, a semirandom adversary is allowed to ``helpfully'' change the specification of the model in a way that is consistent with the ground-truth solution. Our semirandom adversaries in particular are allowed to add edges inside clusters or increase the probability that an edge appears inside a cluster. Semirandom adversaries are a useful tool to determine the extent to which an algorithm has overfit to statistical assumptions on the input. On the positive side, we identify a wide range of semirandom adversaries under which spectral bisection using the _unnormalized_ Laplacian is strongly consistent, i. e. , it exactly recovers the planted partitioning. On the negative side, we show that in many of these settings, _normalized_ spectral bisection outputs a partitioning that makes a classification mistake on a constant fraction of the vertices. Finally, we demonstrate numerical experiments that complement our theoretical findings.

ICML Conference 2023 Conference Paper

Bandit Online Linear Optimization with Hints and Queries

  • Aditya Bhaskara
  • Ashok Cutkosky
  • Ravi Kumar 0001
  • Manish Purohit

We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector. Our motivation is the recent body of work on using such “hints” towards improving regret bounds for OLO problems in the full-information setting. Unlike in the full-information OLO setting, with bandit feedback, we first show that one cannot improve the standard regret bounds of $\tilde{O}(\sqrt{T})$ by using hints, even if they are always well-correlated with the cost vector. In contrast, if the algorithm is empowered to issue queries and if all the responses are correct, then we show $O(\log T)$ regret is achievable. We then show how to make this result more robust—when some of the query responses can be adversarial—by using a little feedback on the quality of the responses.

NeurIPS Conference 2023 Conference Paper

Tight Bounds for Volumetric Spanners and Applications

  • Aditya Bhaskara
  • Sepideh Mahabadi
  • Ali Vakilian

Given a set of points of interest, a volumetric spanner is a subset of the points using which all the points can be expressed using "small" coefficients (measured in an appropriate norm). Formally, given a set of vectors $X = [v_1, v_2, \dots, v_n]$, the goal is to find $T \subseteq [n]$ such that every $v \in X$ can be expressed as $\sum_{i\in T} \alpha_i v_i$, with $\Vert \alpha \Vert$ being small. This notion, which has also been referred to as a well-conditioned basis, has found several applications, including bandit linear optimization, determinant maximization, and matrix low rank approximation. In this paper, we give almost optimal bounds on the size of volumetric spanners for all $\ell_p$ norms, and show that they can be constructed using a simple local search procedure. We then show the applications of our result to other tasks and in particular the problem of finding coresets for the Minimum Volume Enclosing Ellipsoid (MVEE) problem.

ICML Conference 2021 Conference Paper

Additive Error Guarantees for Weighted Low Rank Approximation

  • Aditya Bhaskara
  • Aravinda Kanchana Ruwanpathirana
  • Maheshakya Wijewardena

Low-rank approximation is a classic tool in data analysis, where the goal is to approximate a matrix $A$ with a low-rank matrix $L$ so as to minimize the error $\norm{A - L}_F^2$. However in many applications, approximating some entries is more important than others, which leads to the weighted low rank approximation problem. However, the addition of weights makes the low-rank approximation problem intractable. Thus many works have obtained efficient algorithms under additional structural assumptions on the weight matrix (such as low rank, and appropriate block structure). We study a natural greedy algorithm for weighted low rank approximation and develop a simple condition under which it yields bi-criteria approximation up to a small additive factor in the error. The algorithm involves iteratively computing the top singular vector of an appropriately varying matrix, and is thus easy to implement at scale. Our methods also allow us to study the problem of low rank approximation under $\ell_p$ norm error.

NeurIPS Conference 2021 Conference Paper

Logarithmic Regret from Sublinear Hints

  • Aditya Bhaskara
  • Ashok Cutkosky
  • Ravi Kumar
  • Manish Purohit

We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t \rangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a _hint_ $h_t$ that has non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a regret guarantee of $O(\log T)$, improving on the bound of $\Theta(\sqrt{T})$ in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at _every_ time step. Somewhat surprisingly, we show that an algorithm can obtain $O(\log T)$ regret with just $O(\sqrt{T})$ hints under a natural query model; in contrast, we also show that $o(\sqrt{T})$ hints cannot guarantee better than $\Omega(\sqrt{T})$ regret. We give two applications of our result, to the well-studied setting of {\em optimistic} regret bounds, and to the problem of online learning with abstention.

NeurIPS Conference 2020 Conference Paper

Adaptive Probing Policies for Shortest Path Routing

  • Aditya Bhaskara
  • Sreenivas Gollapudi
  • Kostas Kollias
  • Kamesh Munagala

Inspired by traffic routing applications, we consider the problem of finding the shortest path from a source $s$ to a destination $t$ in a graph, when the lengths of the edges are unknown. Instead, we are given {\em hints} or predictions of the edge lengths from a collection of ML models, trained possibly on historical data and other contexts in the network. Additionally, we assume that the true length of any candidate path can be obtained by {\em probing} an up-to-date snapshot of the network. However, each probe introduces a latency, and thus the goal is to minimize the number of probes while finding a near-optimal path with high probability. We formalize this problem and show assumptions under which it admits to efficient approximation algorithms. We verify these assumptions and validate the performance of our algorithms on real data.

ICML Conference 2020 Conference Paper

Online Learning with Imperfect Hints

  • Aditya Bhaskara
  • Ashok Cutkosky
  • Ravi Kumar 0001
  • Manish Purohit

We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a “hint” vector before choosing the action for that round. Rather surprisingly, it was shown that if the hint vector is guaranteed to have a positive correlation with the cost vector, then the online player can achieve a regret of $O(\log T)$, thus significantly improving over the $O(\sqrt{T})$ regret in the general setting. However, the result and analysis require the correlation property at \emph{all} time steps, thus raising the natural question: can we design online learning algorithms that are resilient to bad hints? In this paper we develop algorithms and nearly matching lower bounds for online learning with imperfect hints. Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case. Our results also generalize, simplify, and improve upon previous results on optimistic regret bounds, which can be viewed as an additive version of hints.

NeurIPS Conference 2020 Conference Paper

Online Linear Optimization with Many Hints

  • Aditya Bhaskara
  • Ashok Cutkosky
  • Ravi Kumar
  • Manish Purohit

We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ ``hint'' vectors in each round prior to making a decision. In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors. This significantly extends prior work that considered only the case $K=1$. To accomplish this, we develop a way to combine many arbitrary OLO algorithms to obtain regret only a logarithmically worse factor than the minimum regret of the original algorithms in hindsight; this result is of independent interest.

NeurIPS Conference 2020 Conference Paper

Online MAP Inference of Determinantal Point Processes

  • Aditya Bhaskara
  • Amin Karbasi
  • Silvio Lattanzi
  • Morteza Zadimoghaddam

In this paper, we provide an efficient approximation algorithm for finding the most likelihood configuration (MAP) of size $k$ for Determinantal Point Processes (DPP) in the online setting where the data points arrive in an arbitrary order and the algorithm cannot discard the selected elements from its local memory. Given a tolerance additive error $\eta$, our \online algorithm achieves a $k^{O(k)}$ multiplicative approximation guarantee with an additive error $\eta$, using a memory footprint independent of the size of the data stream. We note that the exponential dependence on $k$ in the approximation factor is unavoidable even in the offline setting. Our result readily implies a streaming algorithm with an improved memory bound compared to existing results.

NeurIPS Conference 2019 Conference Paper

Greedy Sampling for Approximate Clustering in the Presence of Outliers

  • Aditya Bhaskara
  • Sharvaree Vadgama
  • Hong Xu

Greedy algorithms such as adaptive sampling (k-means++) and furthest point traversal are popular choices for clustering problems. One the one hand, they possess good theoretical approximation guarantees, and on the other, they are fast and easy to implement. However, one main issue with these algorithms is the sensitivity to noise/outliers in the data. In this work we show that for k-means and k-center clustering, simple modifications to the well-studied greedy algorithms result in nearly identical guarantees, while additionally being robust to outliers. For instance, in the case of k-means++, we show that a simple thresholding operation on the distances suffices to obtain an O(\log k) approximation to the objective. We obtain similar results for the simpler k-center problem. Finally, we show experimentally that our algorithms are easy to implement and scale well. We also measure their ability to identify noisy points added to a dataset.

NeurIPS Conference 2019 Conference Paper

On Distributed Averaging for Stochastic k-PCA

  • Aditya Bhaskara
  • Pruthuvi Maheshakya Wijewardena

In the stochastic k-PCA problem, we are given i. i. d. samples from an unknown distribution over vectors, and the goal is to compute the top k eigenvalues and eigenvectors of the moment matrix. In the simplest distributed variant, we have 'm' machines each of which receives 'n' samples. Each machine performs some computation and sends an O(k) size summary of the local dataset to a central server. The server performs an aggregation and computes the desired eigenvalues and vectors. The goal is to achieve the same effect as the server computing using m*n samples by itself. The main choices in this framework are the choice of the summary, and the method of aggregation. We consider a slight variant of the well-studied "distributed averaging" approach, and prove that this leads to significantly better bounds on the dependence between 'n' and the eigenvalue gaps. Our method can also be applied directly to a setting where the "right" value of the parameter k (i. e. , one for which there is a non-trivial eigenvalue gap) is not known exactly. This is a common issue in practice which prior methods were unable to address.

FOCS Conference 2019 Conference Paper

Residual Based Sampling for Online Low Rank Approximation

  • Aditya Bhaskara
  • Silvio Lattanzi
  • Sergei Vassilvitskii
  • Morteza Zadimoghaddam

We propose online algorithms for Column Subset Selection (CSS) and Principal Component Analysis (PCA), two methods that are widely employed for data analysis, summarization, and visualization. Given a data matrix A that is revealed one column at a time, the online CSS problems asks to keep a small set of columns, S, that best approximates the space spanned by the columns of A. As each column arrives, the algorithm must irrevocably decide whether to add it to S, or to ignore it. In the online PCA problem, the goal is to output a projection of each column to a low dimensional subspace. In other words, the algorithm must provide an embedding for each column as it arrives, which cannot be changed as new columns arrive. While both of these problems have been studied in the online setting, only additive approximations were known prior to our work. The core of our approach is an adaptive sampling technique that gives a practical and efficient algorithm for both of these problems. We prove that by sampling columns using their 'residual norm'' (i. e. their norm orthogonal to directions sampled so far), we end up with a significantly better dependence between the number of columns sampled, and the desired error in the approximation. We further show how to combine our algorithm "in series'' with prior algorithms. In particular, using the results of Boutsidis et al. and Frieze et al. that have additive guarantees, we show how to improve the bounds on the error of our algorithm.

FOCS Conference 2019 Conference Paper

Smoothed Analysis in Unsupervised Learning via Decoupling

  • Aditya Bhaskara
  • Aidao Chen
  • Aidan Perreault
  • Aravindan Vijayaraghavan

Smoothed analysis is a powerful paradigm in overcoming worst-case intractability in unsupervised learning and high-dimensional data analysis. While polynomial time smoothed analysis guarantees have been obtained for worst-case intractable problems like tensor decompositions and learning mixtures of Gaussians, such guarantees have been hard to obtain for several other important problems in unsupervised learning. A core technical challenge in analyzing algorithms is obtaining lower bounds on the least singular value for random matrix ensembles with dependent entries, that are given by low-degree polynomials of a few base underlying random variables. In this work, we address this challenge by obtaining high-confidence lower bounds on the least singular value of new classes of structured random matrix ensembles of the above kind. We then use these bounds to design algorithms with polynomial time smoothed analysis guarantees for the following three important problems in unsupervised learning: (1) Robust subspace recovery, when the fraction of inliers in the d-dimensional subspace T of the n-dimensional Euclidean space is at least (d/n) t for any positive integer t. This contrasts with the known worst-case intractability when the fraction of inliers is at most d/n, and the previous smoothed analysis result (Hardt and Moitra, 2013). (2) Learning overcomplete hidden markov models, where the size of the state space is any polynomial in the dimension of the observations. This gives the first polynomial time guarantees for learning overcomplete HMMs in the smoothed analysis model. (3) Higher order tensor decompositions, where we generalize and analyze the so-called FOOBI algorithm of Cardoso to find order-t rank-one tensors in a subspace. This gives polynomially robust decomposition algorithms for order-2t tensors with rank n t.

ICML Conference 2018 Conference Paper

Distributed Clustering via LSH Based Data Partitioning

  • Aditya Bhaskara
  • Maheshakya Wijewardena

Given the importance of clustering in the analysisof large scale data, distributed algorithms for formulations such as k-means, k-median, etc. have been extensively studied. A successful approach here has been the “reduce and merge” paradigm, in which each machine reduces its input size to {Õ}(k), and this data reduction continues (possibly iteratively) until all the data fits on one machine, at which point the problem is solved locally. This approach has the intrinsic bottleneck that each machine must solve a problem of size $\geq$ k, and needs to communicate at least $\Omega$(k) points to the other machines. We propose a novel data partitioning idea to overcome this bottleneck, and in effect, have different machines focus on “finding different clusters”. Under the assumption that we know the optimum value of the objective up to a poly(n) factor (arbitrary polynomial), we establish worst-case approximation guarantees for our method. We see that our algorithm results in lower communication as well as a near-optimal number of ‘rounds’ of computation (in the popular MapReduce framework).

JMLR Journal 2018 Journal Article

On Binary Embedding using Circulant Matrices

  • Felix X. Yu
  • Aditya Bhaskara
  • Sanjiv Kumar
  • Yunchao Gong
  • Shih-Fu Chang

Binary embeddings provide efficient and powerful ways to perform operations on large scale data. However binary embedding typically requires long codes in order to preserve the discriminative power of the input space. Thus binary coding methods traditionally suffer from high computation and storage costs in such a scenario. To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix. The circulant structure allows us to use Fast Fourier Transform algorithms to speed up the computation. For obtaining $k$-bit binary codes from $d$-dimensional data, our method improves the time complexity from $\mathcal{O}(dk)$ to $\mathcal{O}(d\log{d})$, and the space complexity from $\mathcal{O}(dk)$ to $\mathcal{O}(d)$. We study two settings, which differ in the way we choose the parameters of the circulant matrix. In the first, the parameters are chosen randomly and in the second, the parameters are learned using the data. For randomized CBE, we give a theoretical analysis comparing it with binary embedding using an unstructured random projection matrix. The challenge here is to show that the dependencies in the entries of the circulant matrix do not lead to a loss in performance. In the second setting, we design a novel time-frequency alternating optimization to learn data-dependent circulant projections, which alternatively minimizes the objective in original and Fourier domains. In both the settings, we show by extensive experiments that the CBE approach gives much better performance than the state-of-the-art approaches if we fix a running time, and provides much faster computation with negligible performance degradation if we fix the number of bits in the embedding. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

SODA Conference 2016 Conference Paper

Expanders via Local Edge Flips

  • Zeyuan Allen-Zhu
  • Aditya Bhaskara
  • Silvio Lattanzi
  • Vahab Mirrokni
  • Lorenzo Orecchia

Designing distributed and scalable algorithms to improve network connectivity is a central topic in peer-to-peer networks. In this paper we focus on the following well-known problem: given an n -node d -regular network for d = Ω(log n ), we want to design a decentralized, local algorithm that transforms the graph into one that has good connectivity properties (low diameter, expansion, etc.) without affecting the sparsity of the graph. To this end, Mahlmann and Schindelhauer introduced the random “flip” transformation, where in each time step, a random pair of vertices that have an edge decide to ‘swap a neighbor’. They conjectured that performing O ( nd ) such flips at random would convert any connected d -regular graph into a d -regular expander graph, with high probability. However, the best known upper bound for the number of steps is roughly O ( n 17 d 23 ), obtained via a delicate Markov chain comparison argument. Our main result is to prove that a natural instantiation of the random flip produces an expander in at most steps, with high probability. Our argument uses a potential-function analysis based on the matrix exponential, together with the recent beautiful results on the higher-order Cheeger inequality of graphs. We also show that our technique can be used to analyze another well-studied random process known as the ‘random switch’, and show that it produces an expander in O ( nd ) steps with high probability.

ICML Conference 2016 Conference Paper

Greedy Column Subset Selection: New Bounds and Distributed Algorithms

  • Jason M. Altschuler
  • Aditya Bhaskara
  • Gang Fu
  • Vahab Mirrokni
  • Afshin Rostamizadeh
  • Morteza Zadimoghaddam

The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy algorithm has been shown to be quite effective in practice. However, theoretical guarantees on its performance have not been explored thoroughly, especially in a distributed setting. In this paper, we study the greedy algorithm for the column subset selection problem from a theoretical and empirical perspective and show its effectiveness in a distributed setting. In particular, we provide an improved approximation guarantee for the greedy algorithm which we show is tight up to a constant factor, and present the first distributed implementation with provable approximation factors. We use the idea of randomized composable core-sets, developed recently in the context of submodular maximization. Finally, we validate the effectiveness of this distributed algorithm via an empirical study.

NeurIPS Conference 2016 Conference Paper

Linear Relaxations for Finding Diverse Elements in Metric Spaces

  • Aditya Bhaskara
  • Mehrdad Ghadiri
  • Vahab Mirrokni
  • Ola Svensson

Choosing a diverse subset of a large collection of points in a metric space is a fundamental problem, with applications in feature selection, recommender systems, web search, data summarization, etc. Various notions of diversity have been proposed, tailored to different applications. The general algorithmic goal is to find a subset of points that maximize diversity, while obeying a cardinality (or more generally, matroid) constraint. The goal of this paper is to develop a novel linear programming (LP) framework that allows us to design approximation algorithms for such problems. We study an objective known as {\em sum-min} diversity, which is known to be effective in many applications, and give the first constant factor approximation algorithm. Our LP framework allows us to easily incorporate additional constraints, as well as secondary objectives. We also prove a hardness result for two natural diversity objectives, under the so-called {\em planted clique} assumption. Finally, we study the empirical performance of our algorithm on several standard datasets. We first study the approximation quality of the algorithm by comparing with the LP objective. Then, we compare the quality of the solutions produced by our method with other popular diversity maximization algorithms.

NeurIPS Conference 2014 Conference Paper

Distributed Balanced Clustering via Mapping Coresets

  • MohammadHossein Bateni
  • Aditya Bhaskara
  • Silvio Lattanzi
  • Vahab Mirrokni

Large-scale clustering of data points in metric spaces is an important problem in mining big data sets. For many applications, we face explicit or implicit size constraints for each cluster which leads to the problem of clustering under capacity constraints or the balanced clustering'' problem. Although the balanced clustering problem has been widely studied, developing a theoretically sound distributed algorithm remains an open problem. In the present paper we develop a general framework based on mapping coresets'' to tackle this issue. For a wide range of clustering objective functions such as k-center, k-median, and k-means, our techniques give distributed algorithms for balanced clustering that match the best known single machine approximation ratios.

ICML Conference 2014 Conference Paper

Provable Bounds for Learning Some Deep Representations

  • Sanjeev Arora
  • Aditya Bhaskara
  • Rong Ge 0001
  • Tengyu Ma 0001

We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an n node multilayer neural net that has degree at most n^γ for some γ< 1 and each edge has a random edge weight in [-1, 1]. Our algorithm learns almost all networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural nets with random edge weights.

SODA Conference 2013 Conference Paper

Minimum Makespan Scheduling with Low Rank Processing Times

  • Aditya Bhaskara
  • Ravishankar Krishnaswamy
  • Kunal Talwar
  • Udi Wieder

We investigate approximation algorithms for the classical minimum makespan scheduling problem, focusing on instances where the rank of the matrix describing the processing times of the jobs is bounded. A bounded rank matrix arises naturally when the processing time of a job on machine depends upon a bounded set of resources. A bounded rank matrix also shows up when jobs have varying degrees of parallelizability and the machines have multiple cores. We are interested in studying the tractability of the problem as a function of the (positive) rank of the processing-time matrix. At one extreme is the case of unit rank, also known as related machines, which admits a PTAS [7], and at the other extreme is the full rank case ( unrelated machines ), which is NP-hard to approximate within a factor better than 3/2 [8]. Our main technical contribution is in showing that the approximability of the problem is not smooth with the rank of the matrix. From the inapproximability side, we show that the problem becomes APX-hard, even for rank four matrices. For rank seven matrices, we prove that it is hard to approximate to a factor 3/2, matching the inapproximability result for general unrelated machines. From the algorithmic side, we obtain a quasi-polynomial approximation scheme (i. e. , a (1 + ε) approximation in time n poly (1/ε, log n ) ) for the rank two case. This implies that the problem is not APX-hard in this case, unless NP has quasi-polynomial algorithms. Our algorithm is a subtle dynamic program which runs in polynomial time in some interesting special cases. The classification of the three dimensional problem remains open.

SODA Conference 2012 Conference Paper

Polynomial integrality gaps for strong SDP relaxations of Densest k -subgraph

  • Aditya Bhaskara
  • Moses Charikar
  • Aravindan Vijayaraghavan
  • Venkatesan Guruswami
  • Yuan Zhou 0007

The Densest k -subgraph problem (i. e. find a size k subgraph with maximum number of edges), is one of the notorious problems in approximation algorithms. There is a significant gap between known upper and lower bounds for Densest k -subgraph: the current best algorithm gives an ≈ O ( n 1/4 ) approximation, while even showing a small constant factor hardness requires significantly stronger assumptions than P ≠ NP. In addition to interest in designing better algorithms, a number of recent results have exploited the conjectured hardness of Densest k -subgraph and its variants. Thus, understanding the approximability of Densest k -subgraph is an important challenge. In this work, we give evidence for the hardness of approximating Densest k -subgraph within polynomial factors. Specifically we expose the limitations of strong semidefinite programs from SDP hierarchies in solving Densest k -subgraph. Our results include: • A lower bound of Ω( n 1/4 /log 3 n ) on the integrality gap for Ω(log n / log log n ) rounds of the Sherali-Adams relaxation for Densest k -subgraph. This also holds for the relaxation obtained from Sherali-Adams with an added SDP constraint. Our gap instances are in fact Erdös-Renyi random graphs. • For every ∊ > 0, a lower bound of n 2/53 − ∊ on the integrality gap of n Ω(∊) rounds of the Lasserre SDP relaxation for Densest k -subgraph, and an n Ω ∊ (1) gap for n 1−∊ rounds. Our construction proceeds via a reduction from random instances of a certain Max-CSP over large domains. In the absence of inapproximability results for Densest k -subgraph, our results show that beating a factor of n Ω(1) is a barrier for even the most powerful SDPs, and in fact even beating the best known n 1/4 factor is a barrier for current techniques. Our results indicate that approximating Densest k -subgraph within a polynomial factor might be a harder problem than Unique Games or Small Set Expansion, since these problems were recently shown to be solvable using n ∊ ω(1) rounds of the Lasserre hierarchy where ∊ is the completeness parameter in Unique Games and Small Set Expansion.

SODA Conference 2011 Conference Paper

Approximating Matrix p-norms

  • Aditya Bhaskara
  • Aravindan Vijayaraghavan

We consider the problem of computing the q → p norm of a matrix A, which is defined for p, q ≥ 1, as This is in general a non-convex optimization problem, and is a natural generalization of the well-studied question of computing singular values (this corresponds to p = q = 2). Different settings of parameters give rise to a variety of known interesting problems (such as the Grothendieck problem when p = 1 and q = ∞). However, very little is understood about the approximability of the problem for different values of p, q. Our first result is an efficient algorithm for computing the q → p norm of matrices with non-negative entries, when q ≥ p ≥ 1. The algorithm we analyze is based on a natural fixed point iteration, which can be seen as an analog of power iteration for computing eigenvalues. We then present an application of our techniques to the problem of constructing a scheme for oblivious routing in the ℓ p norm. This makes constructive a recent existential result of Englert and Räcke [ER09] on O (log n ) competitive oblivious routing schemes (which they make constructive only for p = 2). On the other hand, when we do not have any restrictions on the entries (such as non-negativity), we prove that the problem is NP-hard to approximate to any constant factor, for 2 < p ≤ q and p ≤ q < 2 (these are precisely the ranges of p, q with p ≤ q where constant factor approximations are not known). In this range, our techniques also show that if NP ∉ DTIME( n polylog( n ) ), the problem cannot be approximated to a factor 2 (log n ) 1–ε, for any constant ε > 0.

STOC Conference 2010 Conference Paper

Detecting high log-densities: an O ( n 1/4 ) approximation for densest k -subgraph

  • Aditya Bhaskara
  • Moses Charikar
  • Eden Chlamtac
  • Uriel Feige
  • Aravindan Vijayaraghavan

In the Densest k-Subgraph problem, given a graph G and a parameter k, one needs to find a subgraph of G induced on k vertices that contains the largest number of edges. There is a significant gap between the best known upper and lower bounds for this problem. It is NP-hard, and does not have a PTAS unless NP has subexponential time algorithms. On the other hand, the current best known algorithm of Feige, Kortsarz and Peleg, gives an approximation ratio of n 1/3 - c for some fixed c>0 (later estimated at around c= 1/90). We present an algorithm that for every ε> 0 approximates the Densest k-Subgraph problem within a ratio of n ¼ + ε in time n O(1/ε) . If allowed to run for time n O(log n) , the algorithm achieves an approximation ratio of O(n ¼ ). Our algorithm is inspired by studying an average-case version of the problem where the goal is to distinguish random graphs from random graphs with planted dense subgraphs -- the approximation ratio we achieve for the general case matches the "distinguishing ratio" we obtain for this planted problem. At a high level, our algorithms involve cleverly counting appropriately defined trees of constant size in G, and using these counts to identify the vertices of the dense subgraph. We say that a graph G(V,E) has log-density α if its average degree is Θ(|V| α ). The algorithmic core of our result is a procedure to output a k-subgraph of 'nontrivial' density whenever the log-density of the densest k-subgraph is larger than the log-density of the host graph. We outline an extension to our approximation algorithm which achieves an O(n ¼ -ε )-approximation in O(2 n O(ε) ) time. We also show that, for certain parameter ranges, eigenvalue and SDP based techniques can outperform our basic distinguishing algorithm for random instances (in polynomial time), though without improving upon the O(n ¼ ) guarantee overall.