Arrow Research search

Author name cluster

Aida Mousavifar

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
1 author row

Possible papers

6

ICML Conference 2024 Conference Paper

A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

  • Vincent Cohen-Addad
  • Tommaso d'Orsi
  • Aida Mousavifar

We consider the semi-random graph model of [Makarychev, Makarychev and Vijayaraghavan, STOC’12], where, given a random bipartite graph with $\alpha$ edges and an unknown bipartition $(A, B)$ of the vertex set, an adversary can add arbitrary edges inside each community and remove arbitrary edges from the cut $(A, B)$ (i. e. all adversarial changes are monotone with respect to the bipartition). For this model, a polynomial time algorithm [MMV’12] is known to approximate the Balanced Cut problem up to value $O(\alpha)$ as long as the cut $(A, B)$ has size $\Omega(\alpha)$. However, it consists of slow subroutines requiring optimal solutions for logarithmically many semidefinite programs. We study the fine-grained complexity of the problem and present the first near-linear time algorithm that achieves similar performances to that of [MMV’12]. Our algorithm runs in time $O(|V(G)|^{1+o(1)} + |E(G)|^{1+o(1)})$ and finds a balanced cut of value $O(\alpha). $ Our approach appears easily extendible to related problem, such as Sparsest Cut, and also yields an near-linear time $O(1)$-approximation to Dagupta’s objective function for hierarchical clustering [Dasgupta, STOC’16] for the semi-random hierarchical stochastic block model inputs of [Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu, JACM’19].

SODA Conference 2021 Conference Paper

Spectral Clustering Oracles in Sublinear Time

  • Grzegorz Gluch
  • Michael Kapralov
  • Silvio Lattanzi
  • Aida Mousavifar
  • Christian Sohler

Given a graph G that can be partitioned into k disjoint expanders with outer conductance upper bounded by ∊ « 1, can we efficiently construct a small space data structure that allows quickly classifying vertices of G according to the expander (cluster) they belong to? Formally, we would like an efficient local computation algorithm that misclassifies at most an O (∊) fraction of vertices in every expander. We refer to such a data structure as a spectral clustering oracle. Our main result is a spectral clustering oracle with query time O ∗( n 1/2+ O (∊) ) and preprocessing time that provides misclassification error O (∊ log k ) per cluster for any ∊ « 1/log k. More generally, query time can be reduced at the expense of increasing the preprocessing time appropriately (as long as the product is about n 1+ O (∊) ) – this in particular gives a nearly linear time spectral clustering primitive. The main technical contribution is a sublinear time oracle that provides dot product access to the spectral embedding of G by estimating distributions of short random walks from vertices in G. The distributions themselves provide a poor approximation to the spectral embedding, but we show that an appropriate linear transformation can be used to achieve high precision dot product access. We give an estimator for this linear transformation and analyze it using spectral perturbation bounds and a novel upper bound on the leverage scores of the spectral embedding matrix of a k -clusterable graph. We then show that dot product access to the spectral embedding is sufficient to design a clustering oracle. At a high level our approach amounts to hyperplane partitioning in the spectral embedding of G, but crucially operates on a nested sequence of carefully defined subspaces in the spectral embedding to achieve per cluster recovery guarantees.

ICML Conference 2018 Conference Paper

Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

  • Ashkan Norouzi-Fard
  • Jakub Tarnawski
  • Slobodan Mitrovic
  • Amir Zandieh
  • Aida Mousavifar
  • Ola Svensson

Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc. , require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0. 5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0. 5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i. e. , that there is no such algorithm with better than 0. 5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.

FOCS Conference 2018 Conference Paper

Testing Graph Clusterability: Algorithms and Lower Bounds

  • Ashish Chiplunkar
  • Michael Kapralov
  • Sanjeev Khanna
  • Aida Mousavifar
  • Yuval Peres

We consider the problem of testing graph cluster structure: given access to a graph G = (V, E), can we quickly determine whether the graph can be partitioned into a few clusters with good inner conductance, or is far from any such graph? This is a generalization of the well-studied problem of testing graph expansion, where one wants to distinguish between the graph having good expansion (i. e. being a good single cluster) and the graph having a sparse cut (i. e. being a union of at least two clusters). A recent work of Czumaj, Peng, and Sohler (STOC'15) gave an ingenious sublinear time algorithm for testing k-clusterability in time Õ(n^1/2 poly(k)). Their algorithm implicitly embeds a random sample of vertices of the graph into Euclidean space, and then clusters the samples based on estimates of Euclidean distances between the points. This yields a very efficient testing algorithm, but only works if the cluster structure is very strong: it is necessary to assume that the gap between conductances of accepted and rejected graphs is at least logarithmic in the size of the graph G. In this paper we show how one can leverage more refined geometric information, namely angles as opposed to distances, to obtain a sublinear time tester that works even when the gap is a sufficiently large constant. Our tester is based on the singular value decomposition of a natural matrix derived from random walk transition probabilities from a small sample of seed nodes. We complement our algorithm with a matching lower bound on the query complexity of testing clusterability. Our lower bound is based on a novel property testing problem, which we analyze using Fourier analytic tools. As a byproduct of our techniques, we also achieve new lower bounds for the problem of approximating MAX-CUT value in sublinear time.