Arrow Research search

Author name cluster

Mohammad Mahdian

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

ICML Conference 2023 Conference Paper

Differentially Private Hierarchical Clustering with Provable Approximation Guarantees

  • Jacob Imola
  • Alessandro Epasto
  • Mohammad Mahdian
  • Vincent Cohen-Addad
  • Vahab Mirrokni

Hierarchical Clustering is a popular unsupervised machine learning method with decades of history and numerous applications. We initiate the study of differentially-private approximation algorithms for hierarchical clustering under the rigorous framework introduced by Dasgupta (2016). We show strong lower bounds for the problem: that any $\epsilon$-DP algorithm must exhibit $O(|V|^2/ \epsilon)$-additive error for an input dataset $V$. Then, we exhibit a polynomial-time approximation algorithm with $O(|V|^{2. 5}/ \epsilon)$-additive error, and an exponential-time algorithm that meets the lower bound. To overcome the lower bound, we focus on the stochastic block model, a popular model of graphs, and, with a separation assumption on the blocks, propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly. Finally, we perform an empirical study of our algorithms and validate their performance.

SODA Conference 2022 Conference Paper

Improved Sliding Window Algorithms for Clustering and Coverage via Bucketing-Based Sketches

  • Alessandro Epasto
  • Mohammad Mahdian
  • Vahab Mirrokni
  • Peilin Zhong

Streaming computation plays an important role in large-scale data analysis. The sliding window model is a model of streaming computation which also captures the recency of the data. In this model, data arrives one item at a time, but only the latest W data items are considered for a particular problem. The goal is to output a good solution at the end of the stream by maintaining a small summary during the stream. In this work, we propose a new algorithmic framework for designing efficient sliding window algorithms via bucketing-based sketches. Based on this new framework, we develop space-efficient sliding window algorithms for k -cover, k -clustering and diversity maximization problems. For each of the above problems, our algorithm achieves (1 ± ∊ )-approximation. Compared with the previous work, it improves both the approximation ratio and the space.

SODA Conference 2022 Conference Paper

Massively Parallel and Dynamic Algorithms for Minimum Size Clustering

  • Alessandro Epasto
  • Mohammad Mahdian
  • Vahab Mirrokni
  • Peilin Zhong

Clustering of data in metric spaces is a fundamental problem and has many applications in data mining and it is often used as an unsupervised learning tool inside other machine learning systems. In many scenarios where we are concerned with the privacy implications of clustering users, clusters are required to have minimum-size constraint. A canonical example of min-size clustering is in enforcing anonymization and the protection of the privacy of user data. Our work is motivated by real-world applications (such as the Federated Learning of Cohorts project–FLoC) where a min size clustering algorithm needs to handle very large amount of data and the data may also change over time. Thus efficient parallel or dynamic algorithms are desired. In this paper, we study the r -gather problem, a natural formulation of minimum-size clustering in metric spaces. The goal of r -gather is to partition n points into clusters such that each cluster has size at least r, and the maximum radius of the clusters is minimized. This additional constraint completely changes the algorithmic nature of the problem, and many clustering techniques fail. Also previous dynamic and parallel algorithms do not achieve desirable complexity. We propose algorithms both in the Massively Parallel Computation (MPC) model and in the dynamic setting. Our MPC algorithm handles input points from the Euclidean space ℝ d. It computes an O (1)-approximate solution of r -gather in O (log ∊ n ) rounds using total space O(n 1 + γ · d ) for arbitrarily small constants ∊, γ > 0. In addition our algorithm is fully scalable, i. e. , there is no lower bound on the memory per machine. Our dynamic algorithm maintains an O (1)-approximate r -gather solution under insertions/deletions of points in a metric space with doubling dimension d. The update time is r ·2 O ( d ) ·log O (1) Λ and the query time is 2 O ( d ) · log O (1) Λ, where Λ is the ratio between the largest and the smallest distance. To obtain our results, we reveal connections between r -gather and r -nearest neighbors and provide several geometric and graph algorithmic tools including a near neighbor graph construction, and results on the maximal independent set / ruling set of the power graph in the MPC model, which might be both of independent interest. To show their generality, we extend our algorithm to solve several variants of r -gather in the MPC model, including r -gather with outliers and r -gather with total distance cost. Finally, we show effectiveness of these algorithmic techniques via a preliminary empirical study for Interest-Based Advertisement applications.

NeurIPS Conference 2020 Conference Paper

Fair Hierarchical Clustering

  • Sara Ahmadian
  • Alessandro Epasto
  • Marina Knittel
  • Ravi Kumar
  • Mohammad Mahdian
  • Benjamin Moseley
  • Philip Pham
  • Sergei Vassilvitskii

As machine learning has become more prevalent, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering. In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a specific objective. For various natural objectives, we obtain simple, efficient algorithms to find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.

NeurIPS Conference 2020 Conference Paper

Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions

  • Alessandro Epasto
  • Mohammad Mahdian
  • Vahab Mirrokni
  • Emmanouil Zampetakis

A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is to changes of its input. Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. This leads to novel soft-max functions, each of which is optimal for a different application. The most commonly used soft-max function, called exponential mechanism, has optimal tradeoff between approximation measured in terms of expected additive approximation and smoothness measured with respect to Renyi Divergence. We introduce a soft-max function, called piece-wise linear soft-max, with optimal tradeoff between approximation, measured in terms of worst-case additive approximation and smoothness, measured with respect to l_q-norm. The worst-case approximation guarantee of the piece-wise linear mechanism enforces sparsity in the output of our soft-max function, a property that is known to be important in Machine Learning applications Martins et al. 16, Laha et al. 18 and is not satisfied by the exponential mechanism. Moreover, the l_q-smoothness is suitable for applications in Mechanism Design and Game Theory where the piece-wise linear mechanism outperforms the exponential mechanism. Finally, we investigate another soft-max function, called power mechanism, with optimal tradeoff between expected multiplicative approximation and smoothness with respect to the Renyi Divergence, which provides improved theoretical and practical results in differentially private submodular optimization.

NeurIPS Conference 2020 Conference Paper

Smoothly Bounding User Contributions in Differential Privacy

  • Alessandro Epasto
  • Mohammad Mahdian
  • Jieming Mao
  • Vahab Mirrokni
  • Lijie Ren

A differentially private algorithm guarantees that the input of a single user won’t significantly change the output distribution of the algorithm. When a user contributes more data points, more information can be collected to improve the algorithm’s performance. But at the same time, more noise might need to be added to the algorithm in order to keep the algorithm differentially private and this might hurt the algorithm’s performance. Amin et al. (2019) initiates the study on bounding user contributions and proposes a very natural algorithm which limits the number of samples each user can contribute by a threshold. For a better trade-off between utility and privacy guarantee, we propose a method which smoothly bounds user contributions by setting appropriate weights on data points and apply it to estimating the mean/quantiles, linear regression, and empirical risk minimization. We show that our algorithm provably outperforms the sample limiting algorithm. We conclude with experimental evaluations which validate our theoretical results.

NeurIPS Conference 2019 Conference Paper

Contextual Bandits with Cross-Learning

  • Santiago Balseiro
  • Negin Golrezaei
  • Mohammad Mahdian
  • Vahab Mirrokni
  • Jon Schneider

In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $a$ to perform, and receives some reward $r_{a, t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{a, t}(c)$, the learner also learns the values of $r_{a, t}(c')$ for all other contexts $c'$; i. e. , the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on $C$ and achieve regret $\tilde{O}(\sqrt{KT})$ (when contexts are stochastic with known distribution), $\tilde{O}(K^{1/3}T^{2/3})$ (when contexts are stochastic with unknown distribution), and $\tilde{O}(\sqrt{KT})$ (when contexts are adversarial but rewards are stochastic). We simulate our algorithms on real auction data from an ad exchange running first-price auctions (showing that they outperform traditional contextual bandit algorithms).

TCS Journal 2019 Journal Article

Visibility testing and counting for uncertain segments

  • Mohammad Ali Abam
  • Sharareh Alipour
  • Mohammad Ghodsi
  • Mohammad Mahdian

We study two well-known planar visibility problems, namely visibility testing and visibility counting, in a model where there is uncertainty about the input data. The standard versions of these problems are defined as follows: we are given a set S of n segments in R 2, and we would like to preprocess S so that we can quickly answer queries of the form: is the given query segment s ∈ S visible from the given query point q ∈ R 2 (for visibility testing) and how many segments in S are visible from the given query point q ∈ R 2 (for visibility counting). In our model of uncertainty, each segment may or may not exist, and if it does, it is located in one of finitely many possible locations, given by a discrete probability distribution. In this setting, the probabilistic visibility testing problem (PVTP, for short) is to compute the probability that a given segment s ∈ S is visible from a given query point q and the probabilistic visibility counting problem (PVCP, for short) is to compute the expected number of segments in S that are visible from a query point q. We first show that PVTP is #P-complete. In the special case where uncertainty is only about whether segments exist and not about their location, we show that PVTP is solvable in O ( n log ⁡ n ) time. Our algorithm for PVTP combined with linearity of expectation gives an O ( n 2 log ⁡ n ) time algorithm for PVCP. Using the algorithm for PVTP, together with a few old tricks, we can show that one can preprocess S in O ( n 5 log ⁡ n ) time into a data structure of size O ( n 4 ), so that each PVTP query for a fixed segment s can be answered in O ( log ⁡ n ) time. We also give a faster 2-approximation algorithm for this problem. At the end, we improve the approximation factor of the algorithm.

SODA Conference 2018 Conference Paper

From Battlefields to Elections: Winning Strategies of Blotto and Auditing Games

  • Soheil Behnezhad
  • Avrim Blum
  • Mahsa Derakhshan
  • MohammadTaghi Hajiaghayi
  • Mohammad Mahdian
  • Christos H. Papadimitriou
  • Ronald L. Rivest
  • Saeed Seddighin

Mixed strategies are often evaluated based on the expected payoff that they guarantee. This is not always desirable. In this paper, we consider games for which maximizing the expected payoff deviates from the actual goal of the players. To address this issue, we introduce the notion of a ( u, p )-maxmin strategy which ensures receiving a minimum utility of u with probability at least p. We then give approximation algorithms for the problem of finding a ( u, p )-maxmin strategy for these games. The first game that we consider is Colonel Blotto, a well-studied game that was introduced in 1921. In the Colonel Blotto game, two colonels divide their troops among a set of battlefields. Each battlefield is won by the colonel that puts more troops in it. The payoff of each colonel is the weighted number of battlefields that she wins. We show that maximizing the expected payoff of a player does not necessarily maximize her winning probability for certain applications of Colonel Blotto. For example, in presidential elections, the players’ goal is to maximize the probability of winning more than half of the votes, rather than maximizing the expected number of votes that they get. We give an exact algorithm for a natural variant of continuous version of this game. More generally, we provide constant and logarithmic approximation algorithms for finding ( u, p )-maxmin strategies. We also introduce a security game version of Colonel Blotto which we call auditing game. It is played between two players, a defender and an attacker. The goal of the defender is to prevent the attacker from changing the outcome of an instance of Colonel Blotto. Again, maximizing the expected payoff of the defender is not necessarily optimal. Therefore we give a constant approximation for ( u, p )-maxmin strategies.

NeurIPS Conference 2016 Conference Paper

Community Detection on Evolving Graphs

  • Aris Anagnostopoulos
  • Jakub Łącki
  • Silvio Lattanzi
  • Stefano Leonardi
  • Mohammad Mahdian

Clustering is a fundamental step in many information-retrieval and data-mining applications. Detecting clusters in graphs is also a key tool for finding the community structure in social and behavioral networks. In many of these applications, the input graph evolves over time in a continual and decentralized manner, and, to maintain a good clustering, the clustering algorithm needs to repeatedly probe the graph. Furthermore, there are often limitations on the frequency of such probes, either imposed explicitly by the online platform (e. g. , in the case of crawling proprietary social networks like twitter) or implicitly because of resource limitations (e. g. , in the case of crawling the web). In this paper, we study a model of clustering on evolving graphs that captures this aspect of the problem. Our model is based on the classical stochastic block model, which has been used to assess rigorously the quality of various static clustering methods. In our model, the algorithm is supposed to reconstruct the planted clustering, given the ability to query for small pieces of local information about the graph, at a limited rate. We design and analyze clustering algorithms that work in this model, and show asymptotically tight upper and lower bounds on their accuracy. Finally, we perform simulations, which demonstrate that our main asymptotic results hold true also in practice.

ICML Conference 2016 Conference Paper

Pricing a Low-regret Seller

  • Hoda Heidari
  • Mohammad Mahdian
  • Umar Syed
  • Sergei Vassilvitskii
  • Sadra Yazdanbod

As the number of ad exchanges has grown, publishers have turned to low regret learning algorithms to decide which exchange offers the best price for their inventory. This in turn opens the following question for the exchange: how to set prices to attract as many sellers as possible and maximize revenue. In this work we formulate this precisely as a learning problem, and present algorithms showing that by simply knowing that the counterparty is using a low regret algorithm is enough for the exchange to have its own low regret learning algorithm to find the optimal price.

TCS Journal 2014 Journal Article

The complexity of LSH feasibility

  • Flavio Chierichetti
  • Ravi Kumar
  • Mohammad Mahdian

In this paper we study the complexity of the following feasibility problem: given an n × n similarity matrix S as input, is there a locality sensitive hash (LSH) for S? We show that the LSH feasibility problem is NP-hard even in the following strong promise version: either S admits an LSH or S is at ℓ 1 -distance at least n 2 − ϵ from every similarity that admits an LSH. We complement this hardness result by providing an O ˜ ( 3 n ) algorithm for the LSH feasibility problem, which improves upon the naïve n Θ ( n ) time algorithm; we prove that this running time is tight, modulo constants, under the Exponential Time Hypothesis.

AAMAS Conference 2012 Conference Paper

TrustBets: Betting over an IOU Network

  • Sharad Goel
  • Mohammad Mahdian
  • David Pennock
  • Daniel Reeves

We consider the problem of operating a gambling market where players pay with IOUs instead of cash, and where in general not everyone trusts everyone else. Players declare their degree of trust in other players—for example, Alice trusts Bob for up to ten dollars, and Bob trusts Carol up to twenty dollars. The system determines what bets are acceptable according to the trust network. For example, Carol may be able to place a bet where she is at risk of losing ten dollars to Alice, even if Alice doesn’t trust Carol directly, because the IOU can be routed through Bob. We show that if agents can bet on n events with binary outcomes, the problem of determining whether a collection of bets is acceptable is NP-hard. In the special case when the trust network is a tree, the problem can be solved in polynomial time using a maximum flow algorithm.

TCS Journal 2011 Journal Article

Sorting and selection on dynamic data

  • Aris Anagnostopoulos
  • Ravi Kumar
  • Mohammad Mahdian
  • Eli Upfal

We formulate and study a new computational model for dynamic data. In this model, the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint that it only has limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance close to the optimal in expectation and with high probability.

FOCS Conference 2007 Conference Paper

Balloon Popping With Applications to Ascending Auctions

  • Nicole Immorlica
  • Anna R. Karlin
  • Mohammad Mahdian
  • Kunal Talwar

We study the power of ascending auctions in a scenario in which a seller is selling a collection of identical items to anonymous unit'demand bidders. We show that even with full knowledge of the set of bidders' private valuations for the items, if the bidders are ex-ante identical, no ascending auction can extract more than a constant. times the revenue of the best fixed-price scheme. This problem is equivalent to the problem of coming up with an optimal strategy for blowing up indistinguishable balloons with known capacities in order to maximize the amount of contained, air. We show that the algorithm which simply inflates all balloons to a fixed volume is close to optimal in this setting.

FOCS Conference 2004 Conference Paper

Tolls for Heterogeneous Selfish Users in Multicommodity Networks and Generalized Congestion Games

  • Lisa Fleischer
  • Kamal Jain
  • Mohammad Mahdian

We prove the existence of tolls to induce multicommodity, heterogeneous network users that independently choose routes minimizing their own linear function of tolls versus latency to collectively form the traffic pattern of a minimum average latency flow. This generalizes both the previous known results of the existence of tolls for multicommodity, homogeneous users (Beckman et al. , 1956) and for single commodity, heterogeneous users (Cole et al. , 2003). Unlike previous proofs for single commodity users in general graphs, our proof is constructive - it does not rely on a fixed point theorem - and results in a simple polynomial-sized linear program to compute tolls when the number of different types of users is bounded by a polynomial. We show that our proof gives a complete characterization of flows that are enforceable by tolls. In particular, tolls exist to induce any traffic pattern that is the result of minimizing an arbitrary function from R/sup E(G)/ to the reals that is nondecreasing in each of its arguments. Thus, tolls exist to induce flows with minimum average weighted latency, minimum maximum latency, and other natural objectives. We give an exponential bound on tolls that is independent of the number of network users and the number of commodities. We use this to show that multicommodity tolls also exist when users are not from discrete classes, but instead define a general function that trades off latency versus toll preference. Finally, we show that our result extends to very general frameworks. In particular, we show that tolls exist to induce the Nash equilibrium of general nonatomic congestion games to be system optimal. In particular, tolls exist even when 1) latencies depend on user type; 2) latency functions are nonseparable functions of traffic on edges; 3) the latency of a set S is an arbitrary function of the latencies of the resources contained in S. Our exponential bound on size of tolls also holds in this case; and we give an example of a congestion game that shows this is tight; it requires tolls that are exponential in the size of the game.

STOC Conference 2002 Conference Paper

A new greedy approach for facility location problems

  • Kamal Jain
  • Mohammad Mahdian
  • Amin Saberi

We present a simple and natural greedy algorithm for the metric uncapacitated facility location problem achieving an approximation guarantee of 1.61. We use this algorithm to find better approximation algorithms for the capacitated facility location problem with soft capacities and for a common generalization of the k -median and facility location problems. We also prove a lower bound of 1+2/ e on the approximability of the k -median problem. At the end, we present a discussion about the techniques we have used in the analysis of our algorithm, including a computer-aided method for proving bounds on the approximation factor.