Arrow Research search

Author name cluster

Hossein Esfandiari

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

29 papers
2 author rows

Possible papers

29

NeurIPS Conference 2025 Conference Paper

Replicable Online pricing

  • Kiarash Banihashem
  • MohammadHossein Bateni
  • Hossein Esfandiari
  • Samira Goudarzi
  • MohammadTaghi Hajiaghayi

We explore the concept of replicability, which ensures algorithmic consistency despite input data variations, for online pricing problems, specifically prophet inequalities and delegation. Given the crucial role of replicability in enhancing transparency in economic decision-making, we present a replicable and nearly optimal pricing strategy for prophet inequalities, achieving a sample complexity of $\textnormal{poly}(\log^* |\mathcal{X}|)$, where $\mathcal{X}$ is the ground set of distributions. Furthermore, we extend these findings to the delegation problem and establish lower bound that proves the necessity of the $\log^*|\mathcal{X}|$ dependence. En route to obtaining these results, we develop a number of technical contributions which are of independent interest. Most notably, we propose a new algorithm for a variant of the heavy hitter problem, which has a nearly linear dependence on the inverse of the heavy hitter parameter, significantly improving upon existing results which have a cubic dependence.

ICML Conference 2024 Conference Paper

High-Dimensional Geometric Streaming for Nearly Low Rank Data

  • Hossein Esfandiari
  • Praneeth Kacham
  • Vahab Mirrokni
  • David P. Woodruff
  • Peilin Zhong

We study streaming algorithms for the $\ell_p$ subspace approximation problem. Given points $a_1, \ldots, a_n$ as an insertion-only stream and a rank parameter $k$, the $\ell_p$ subspace approximation problem is to find a $k$-dimensional subspace $V$ such that $(\sum_{i=1}^n d(a_i, V)^p)^{1/p}$ is minimized, where $d(a, V)$ denotes the Euclidean distance between $a$ and $V$ defined as $\min_{v \in V} ||a - v||$. When $p = \infty$, we need to find a subspace $V$ that minimizes $\max_i d(a_i, V)$. For $\ell_{\infty}$ subspace approximation, we give a deterministic strong coreset construction algorithm and show that it can be used to compute a $\mathrm{poly}(k, \log n)$ approximate solution. We show that the distortion obtained by our coreset is nearly tight for any sublinear space algorithm. For $\ell_p$ subspace approximation, we show that suitably scaling the points and then using our $\ell_{\infty}$ coreset construction, we can compute a $\mathrm{poly}(k, \log n)$ approximation. Our algorithms are easy to implement and run very fast on large datasets. We also use our strong coreset construction to improve the results in a recent work of Woodruff and Yasuda (FOCS 2022) which gives streaming algorithms for high-dimensional geometric problems such as width estimation, convex hull estimation, and volume estimation.

ICLR Conference 2023 Conference Paper

Replicable Bandits

  • Hossein Esfandiari
  • Alkis Kalavasis
  • Amin Karbasi
  • Andreas Krause 0001
  • Vahab Mirrokni
  • Grigoris Velegkas

In this paper, we introduce the notion of replicable policies in the context of stochastic bandits, one of the canonical problems in interactive learning. A policy in the bandit environment is called replicable if it pulls, with high probability, the exact same sequence of arms in two different and independent executions (i.e., under independent reward realizations). We show that not only do replicable policies exist, but also they achieve almost the same optimal (non-replicable) regret bounds in terms of the time horizon. More specifically, in the stochastic multi-armed bandits setting, we develop a policy with an optimal problem-dependent regret bound whose dependence on the replicability parameter is also optimal. Similarly, for stochastic linear bandits (with finitely and infinitely many arms) we develop replicable policies that achieve the best-known problem-independent regret bounds with an optimal dependency on the replicability parameter. Our results show that even though randomization is crucial for the exploration-exploitation trade-off, an optimal balance can still be achieved while pulling the exact same arms in two different rounds of executions.

NeurIPS Conference 2023 Conference Paper

Replicable Clustering

  • Hossein Esfandiari
  • Amin Karbasi
  • Vahab Mirrokni
  • Grigoris Velegkas
  • Felix Zhou

We design replicable algorithms in the context of statistical clustering under the recently introduced notion of replicability from Impagliazzo et al. [2022]. According to this definition, a clustering algorithm is replicable if, with high probability, its output induces the exact same partition of the sample space after two executions on different inputs drawn from the same distribution, when its internal randomness is shared across the executions. We propose such algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their combinatorial counterparts in a black-box manner. In particular, we demonstrate a replicable $O(1)$-approximation algorithm for statistical Euclidean $k$-medians ($k$-means) with $\operatorname{poly}(d)$ sample complexity. We also describe an $O(1)$-approximation algorithm with an additional $O(1)$-additive error for statistical Euclidean $k$-centers, albeit with $\exp(d)$ sample complexity. In addition, we provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.

ICML Conference 2023 Conference Paper

Robust and private stochastic linear bandits

  • Vasileios Charisopoulos
  • Hossein Esfandiari
  • Vahab Mirrokni

In this paper, we study the stochastic linear bandit problem under the additional requirements of differential privacy, robustness and batched observations. In particular, we assume an adversary randomly chooses a constant fraction of the observed rewards in each batch, replacing them with arbitrary numbers. We present differentially private and robust variants of the arm elimination algorithm using logarithmic batch queries under two privacy models and provide regret bounds in both settings. In the first model, every reward in each round is reported by a potentially different client, which reduces to standard local differential privacy (LDP). In the second model, every action is "owned" by a different client, who may aggregate the rewards over multiple queries and privatize the aggregate response instead. To the best of our knowledge, our algorithms are the first simultaneously providing differential privacy and adversarial robustness in the stochastic linear bandits problem.

JMLR Journal 2023 Journal Article

Robust Load Balancing with Machine Learned Advice

  • Sara Ahmadian
  • Hossein Esfandiari
  • Vahab Mirrokni
  • Binghui Peng

Motivated by the exploding growth of web-based services and the importance of efficiently managing the computational resources of such systems, we introduce and study a theoretical model for load balancing of very large databases such as commercial search engines. Our model is a more realistic version of the well-received \bab model with an additional constraint that limits the number of servers that carry each piece of the data. This additional constraint is necessary when, on one hand, the data is so large that we can not copy the whole data on each server. On the other hand, the query response time is so limited that we can not ignore the fact that the number of queries for each piece of the data changes over time, and hence we can not simply split the data over different machines. In this paper, we develop an almost optimal load balancing algorithm that works given an estimate of the load of each piece of the data. Our algorithm is almost perfectly robust to wrong estimates, to the extent that even when all of the loads are adversarially chosen the performance of our algorithm is $1-1/e$, which is provably optimal. Along the way, we develop various techniques for analyzing the balls-into-bins process under certain correlations and build a novel connection with the multiplicative weights update scheme. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

TMLR Journal 2023 Journal Article

Tackling Provably Hard Representative Selection via Graph Neural Networks

  • Mehran Kazemi
  • Anton Tsitsulin
  • Hossein Esfandiari
  • MohammadHossein Bateni
  • Deepak Ramachandran
  • Bryan Perozzi
  • Vahab Mirrokni

Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for attributed graphs, and focus on finding representative nodes that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result for RS (in the absence of a graph structure) by proving that a particular, highly practical variant of it (RS for Learning) is hard to approximate in polynomial time within any reasonable factor, which implies a significant potential gap between the optimum solution of widely-used surrogate functions and the actual accuracy of the model. We then study the setting where a (homophilous) graph structure is available, or can be constructed, between the data points. We show that with an appropriate modeling approach, the presence of such a structure can turn a hard RS (for learning) problem into one that can be effectively solved. To this end, we develop RS-GNN, a representation learning-based RS model based on Graph Neural Networks. Empirically, we demonstrate the effectiveness of RS-GNN on problems with predefined graph structures as well as problems with graphs induced from node feature similarities, by showing that RS-GNN achieves significant improvements over established baselines on a suite of eight benchmarks.

SODA Conference 2022 Conference Paper

Almost Tight Approximation Algorithms for Explainable Clustering

  • Hossein Esfandiari
  • Vahab Mirrokni
  • Shyam Narayanan

Recently, due to an increasing interest for transparency in artificial intelligence, several methods of explainable machine learning have been developed with the simultaneous goal of accuracy and interpretability by humans. In this paper, we study a recent framework of explainable clustering first suggested by Dasgupta et al. [11]. Specifically, we focus on the k -means and k -median problems and provide nearly tight upper and lower bounds. First, we provide an O (log k log log k )-approximation algorithm for explainable k -median, improving on the best known algorithm of O ( k ) [11] and nearly matching the known Ω(log k ) lower bound [11]. In addition, in low-dimensional spaces d ≪ log k, we show that our algorithm also provides an O ( d log 2 d )-approximate solution for explainable k -median. This improves over the best known bound of O(d log k ) for low dimensions [19], and is a constant for constant dimensional spaces. To complement this, we show a nearly matching Ω( d ) lower bound. Next, we study the k -means problem in this context and provide an O ( k log k )-approximation algorithm for explainable k -means, improving over the O ( k 2 ) bound of Dasgupta et al. and the O(dk log k ) bound of [19]. To complement this we provide an almost tight Ω( k ) lower bound, improving over the Ω(log k ) lower bound of Dasgupta et al. Given an approximate solution to the classic k -means and k -median, our algorithm for k -median runs in time O ( kd log 2 k ) and our algorithm for k -means runs in time O(k 2 d ).

NeurIPS Conference 2022 Conference Paper

Anonymous Bandits for Multi-User Systems

  • Hossein Esfandiari
  • Vahab Mirrokni
  • Jon Schneider

In this work, we present and study a new framework for online learning in systems with multiple users that provide user anonymity. Specifically, we extend the notion of bandits to obey the standard $k$-anonymity constraint by requiring each observation to be an aggregation of rewards for at least $k$ users. This provides a simple yet effective framework where one can learn a clustering of users in an online fashion without observing any user's individual decision. We initiate the study of anonymous bandits and provide the first sublinear regret algorithms and lower bounds for this setting.

SODA Conference 2022 Conference Paper

Robust Load Balancing with Machine Learned Advice

  • Sara Ahmadian
  • Hossein Esfandiari
  • Vahab Mirrokni
  • Binghui Peng

Motivated by the exploding growth of web-based services and the importance of efficiently managing the computational resources of such systems, we introduce and study a theoretical model for load balancing of very large databases such as commercial search engines. Our model is a more realistic version of the well-received balls-into-bins model with an additional constraint that limits the number of servers that carry each piece of the data. This additional constraint is necessary when, on one hand, the data is so large that we can not copy the whole data on each server. On the other hand, the query response time is so limited that we can not ignore the fact that the number of queries for each piece of the data changes over time, and hence we can not simply split the data over different machines In this paper, we develop an almost optimal load balancing algorithm that works given an estimate of the load of each piece of the data. Our algorithm is almost perfectly robust to wrong estimates, to the extent that even when all of the loads are adversarially chosen the performance of our algorithm is 1–1/ e, which is provably optimal. Along the way, we develop various techniques for analyzing the balls-into-bins process under certain correlations and build a novel connection with the multiplicative weights update scheme.

ICML Conference 2022 Conference Paper

Tight and Robust Private Mean Estimation with Few Users

  • Shyam Narayanan
  • Vahab Mirrokni
  • Hossein Esfandiari

In this work, we study high-dimensional mean estimation under user-level differential privacy, and design an $(\varepsilon, \delta)$-differentially private mechanism using as few users as possible. In particular, we provide a nearly optimal trade-off between the number of users and the number of samples per user required for private mean estimation, even when the number of users is as low as $O(\frac{1}{\varepsilon}\log\frac{1}{\delta})$. Interestingly, this bound on the number of users is independent of the dimension (though the number of samples per user is allowed to depend polynomially on the dimension), unlike the previous work that requires the number of users to depend polynomially on the dimension. This resolves a problem first proposed by Amin et al. (2019). Moreover, our mechanism is robust against corruptions in up to $49%$ of the users. Finally, our results also apply to optimal algorithms for privately learning discrete distributions with few users, answering a question of Liu et al. (2020), and a broader range of problems such as stochastic convex optimization and a variant of stochastic gradient descent via a reduction to differentially private mean estimation.

AAAI Conference 2021 Conference Paper

Almost Linear Time Density Level Set Estimation via DBSCAN

  • Hossein Esfandiari
  • Vahab Mirrokni
  • Peilin Zhong

In this work we focus on designing a fast algorithm for λdensity level set estimation via DBSCAN clustering. Previous work (Jiang ICML’17, and Jang and Jiang ICML’19) shows that under some natural assumptions DBSCAN and its variant DBSCAN++ can be used to estimate the λ-density level set with near-optimal Hausdorff distance, i. e. , with rate e O(n−1/(2β+D) ). However, to achieve this near-optimal rate, the current fastest DBSCAN algorithm needs near quadratic running time. This running time is not practical for large datasets. Usually when we are working with large datasets we desire linear or almost linear time algorithms. With this motivation, in this work, we present a modified DBSCAN algorithm with near optimal Hausdorff distance for density level set estimation with e O(n) running time. In our empirical study, we show that our algorithm provides significant speedup over the previous algorithms, while achieving comparable solution quality.

AAAI Conference 2021 Conference Paper

Extreme k-Center Clustering

  • MohammadHossein Bateni
  • Hossein Esfandiari
  • Manuela Fischer
  • Vahab Mirrokni

Metric clustering is a fundamental primitive in machine learning with several applications for mining massive datasets. An important example of metric clustering is the k-center problem. While this problem has been extensively studied in distributed settings, all previous algorithms use Ω(k) space per machine and Ω(nk) total work. In this paper, we develop the first highly scalable approximation algorithm for k-center clustering, with e O(nε ) space per machine and e O(n1+ ) total work, for arbitrary small constant ε. It produces an O(log log log n)approximate solution with k(1+o(1)) centers in O(log log n) rounds of computation.

AAAI Conference 2021 Conference Paper

Regret Bounds for Batched Bandits

  • Hossein Esfandiari
  • Amin Karbasi
  • Abbas Mehrabian
  • Vahab Mirrokni

We present simple algorithms for batched stochastic multiarmed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and provide the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.

NeurIPS Conference 2020 Conference Paper

Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming

  • Joey Huchette
  • Haihao Lu
  • Hossein Esfandiari
  • Vahab Mirrokni

We study the problem of learning a linear model to set the reserve price in an auction, given contextual information, in order to maximize expected revenue from the seller side. First, we show that it is not possible to solve this problem in polynomial time unless the Exponential Time Hypothesis fails. Second, we present a strong mixed-integer programming (MIP) formulation for this problem, which is capable of exactly modeling the nonconvex and discontinuous expected reward function. Moreover, we show that this MIP formulation is ideal (i. e. the strongest possible formulation) for the revenue function of a single impression. Since it can be computationally expensive to exactly solve the MIP formulation in practice, we also study the performance of its linear programming (LP) relaxation. Though it may work well in practice, we show that, unfortunately, in the worst case the optimal objective of the LP relaxation can be O(number of samples) times larger than the optimal objective of the true problem. Finally, we present computational results, showcasing that the MIP formulation, along with its LP relaxation, are able to achieve superior in- and out-of-sample performance, as compared to state-of-the-art algorithms on both real and synthetic datasets. More broadly, we believe this work offers an indication of the strength of optimization methodologies like MIP to exactly model intrinsic discontinuities in machine learning problems.

ICML Conference 2019 Conference Paper

Categorical Feature Compression via Submodular Optimization

  • MohammadHossein Bateni
  • Lin Chen
  • Hossein Esfandiari
  • Thomas Fu
  • Vahab Mirrokni
  • Afshin Rostamizadeh

In the era of big data, learning from categorical features with very large vocabularies (e. g. , 28 million for the Criteo click prediction dataset) has become a practical challenge for machine learning researchers and practitioners. We design a highly-scalable vocabulary compression algorithm that seeks to maximize the mutual information between the compressed categorical feature and the target binary labels and we furthermore show that its solution is guaranteed to be within a $1-1/e \approx 63%$ factor of the global optimal solution. Although in some settings, entropy-based set functions are known to be submodular, this is not the case for the mutual information objective we consider (mutual information with respect to the target labels). To address this, we introduce a novel re-parametrization of the mutual information objective, which we prove is submodular, and also design a data structure to query the submodular function in amortized $O(\log n )$ time (where $n$ is the input vocabulary size). Our complete algorithm is shown to operate in $O(n \log n )$ time. Additionally, we design a distributed implementation in which the query data structure is decomposed across $O(k)$ machines such that each machine only requires $O(\frac n k)$ space, while still preserving the approximation guarantee and using only logarithmic rounds of computation. We also provide analysis of simple alternative heuristic compression methods to demonstrate they cannot achieve any approximation guarantee. Using the large-scale Criteo learning task, we demonstrate better performance in retaining mutual information and also verify competitive learning performance compared to other baseline methods.

NeurIPS Conference 2019 Conference Paper

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

  • Lin Chen
  • Hossein Esfandiari
  • Gang Fu
  • Vahab Mirrokni

Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the locality-sensitive hashing (LSH) scheme. In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for f-divergences as well as a generalization to capture mutual information loss. First, we provide a general framework to design LHS schemes for f-divergence distance functions and develop LSH schemes for the generalized Jensen-Shannon divergence and triangular discrimination in this framework. We show a two-sided approximation result for approximation of the generalized Jensen-Shannon divergence by the Hellinger distance, which may be of independent interest. Next, we show a general method of reducing the problem of designing an LSH scheme for a Krein kernel (which can be expressed as the difference of two positive definite kernels) to the problem of maximum inner product search. We exemplify this method by applying it to the mutual information loss, due to its several important applications such as model compression.

FOCS Conference 2019 Conference Paper

Near-Optimal Massively Parallel Graph Connectivity

  • Soheil Behnezhad
  • Laxman Dhulipala
  • Hossein Esfandiari
  • Jakub Lacki
  • Vahab Mirrokni

Identifying the connected components of a graph, apart from being a fundamental problem with countless applications, is a key primitive for many other algorithms. In this paper, we consider this problem in parallel settings. Particularly, we focus on the Massively Parallel Computations (MPC) model, which is the standard theoretical model for modern parallel frameworks such as MapReduce, Hadoop, or Spark. We consider the truly sublinear regime of MPC for graph problems where the space per machine is n δ for some desirably small constant δ ϵ (0, 1). We present an algorithm that for graphs with diameter D in the wide range [log ε n, n], takes O(log D) rounds to identify the connected components and takes O(log log n) rounds for all other graphs. The algorithm is randomized, succeeds with high probability, does not require prior knowledge of D, and uses an optimal total space of O(m). We complement this by showing a conditional lower-bound based on the widely believed TwoCycle conjecture that Ω(log D) rounds are indeed necessary in this setting. Studying parallel connectivity algorithms received a resurgence of interest after the pioneering work of Andoni etal [FOCS 2018] who presented an algorithm with O(log D log log n) round-complexity. Our algorithm improves this result for the whole range of values of D and almost settles the problem due to the conditional lower-bound. Additionally, we show that with minimal adjustments, our algorithm can also be implemented in a variant of (CRCW) PRAM in asymptotically the same number of rounds.

AAAI Conference 2019 Conference Paper

Online Pandora’s Boxes and Bandits

  • Hossein Esfandiari
  • MohammadTaghi Hajiaghayi
  • Brendan Lucier
  • Michael Mitzenmacher

We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.

FOCS Conference 2018 Conference Paper

Metric Sublinear Algorithms via Linear Sampling

  • Hossein Esfandiari
  • Michael Mitzenmacher

We provide a new technique to design fast approximation algorithms for graph problems where the points of the graph lie in a metric space. Specifically, we present a sampling approach for such metric graphs that, using a sublinear number of edge weight queries, provides a linear sampling, where each edge is (roughly speaking) sampled proportionally to its weight. For several natural problems, such as densest subgraph and max cut, we show that by sparsifying the graph using this sampling process, we can run a suitable approximation algorithm on the sparsified graph and the result remains a good approximation for the original problem. Our results have several interesting implications, such as providing the first sublinear time approximation algorithm for densest subgraph in a metric space, and improving the running time of estimating the average distance.

ICML Conference 2018 Conference Paper

Parallel and Streaming Algorithms for K-Core Decomposition

  • Hossein Esfandiari
  • Silvio Lattanzi
  • Vahab Mirrokni

The k-core decomposition is a fundamental primitive in many machine learning and data mining applications. We present the first distributed and the first streaming algorithms to compute and maintain an approximate k-core decomposition with provable guarantees. Our algorithms achieve rigorous bounds on space complexity while bounding the number of passes or number of rounds of computation. We do so by presenting a new powerful sketching technique for k-core decomposition, and then by showing it can be computed efficiently in both streaming and MapReduce models. Finally, we confirm the effectiveness of our sketching technique empirically on a number of publicly available graphs.

STOC Conference 2017 Conference Paper

Beating 1-1/e for ordered prophets

  • Melika Abolhassani
  • Soheil Ehsani
  • Hossein Esfandiari
  • MohammadTaghi Hajiaghayi
  • Robert Kleinberg
  • Brendan Lucier

Hill and Kertz studied the prophet inequality on iid distributions [ The Annals of Probability 1982 ]. They proved a theoretical bound of 1 - 1/ e on the approximation factor of their algorithm. They conjectured that the best approximation factor for arbitrarily large n is 1/1+1/ e ≃ 0.731. This conjecture remained open prior to this paper for over 30 years. In this paper we present a threshold-based algorithm for the prophet inequality with n iid distributions. Using a nontrivial and novel approach we show that our algorithm is a 0.738-approximation algorithm. By beating the bound of 1/1+1/ e , this refutes the conjecture of Hill and Kertz. Moreover, we generalize our results to non-uniform distributions and discuss its applications in mechanism design.

AAAI Conference 2017 Conference Paper

Market Pricing for Data Streams

  • Melika Abolhassani
  • Hossein Esfandiari
  • MohammadTaghi Hajiaghayi
  • Brendan Lucier
  • Hadi Yami

Internet-enabled marketplaces such as Amazon deal with huge datasets registering transaction of merchandises between lots of buyers and sellers. It is important that algorithms become more time and space efficient as the size of datasets increase. An algorithm that runs in polynomial time may not have a reasonable running time for such large datasets. Here, we study the development of pricing algorithms that are appropriate for use with massive datasets. We especially focus on the streaming setting, the common model for big data analysis. We present an envy-free mechanism for social welfare maximization problem in the streaming setting using O(k2 l) space, where k is the number of different goods and l is the number of available items of each good. We also provide an αapproximation mechanism for revenue maximization in this setting given an α-approximation mechanism for the corresponding offline problem exists. Moreover, we provide mechanisms to approximate the optimum social welfare (or revenue) within 1 − factor, in space independent of l which would be favorable in case l is large compared to k. Finally, we present hardness results showing approximation of optimal prices that maximize social welfare (or revenue) in the streaming setting needs Ω(l) space. We achieve our results by developing a powerful sampling technique for bipartite networks. The simplicity of our sampling technique empowers us to maintain the sample over the input sequence. Indeed, one can construct this sample in the distributed setting (a. k. a, MapReduce) and get the same results in two rounds of computations, or one may simply apply this sampling technique to provide faster offline algorithms.

NeurIPS Conference 2016 Conference Paper

Bi-Objective Online Matching and Submodular Allocations

  • Hossein Esfandiari
  • Nitish Korula
  • Vahab Mirrokni

Online allocation problems have been widely studied due to their numerous practical applications (particularly to Internet advertising), as well as considerable theoretical interest. The main challenge in such problems is making assignment decisions in the face of uncertainty about future input; effective algorithms need to predict which constraints are most likely to bind, and learn the balance between short-term gain and the value of long-term resource availability. In many important applications, the algorithm designer is faced with multiple objectives to optimize. In particular, in online advertising it is fairly common to optimize multiple metrics, such as clicks, conversions, and impressions, as well as other metrics which may be largely uncorrelated such as ‘share of voice’, and ‘buyer surplus’. While there has been considerable work on multi-objective offline optimization (when the entire input is known in advance), very little is known about the online case, particularly in the case of adversarial input. In this paper, we give the first results for bi-objective online submodular optimization, providing almost matching upper and lower bounds for allocating items to agents with two submodular value functions. We also study practically relevant special cases of this problem related to Internet advertising, and obtain improved results. All our algorithms are nearly best possible, as well as being efficient and easy to implement in practice.

SODA Conference 2016 Conference Paper

Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams

  • Rajesh Chitnis
  • Graham Cormode
  • Hossein Esfandiari
  • MohammadTaghi Hajiaghayi
  • Andrew McGregor 0001
  • Morteza Monemizadeh
  • Sofya Vorotnikova

In this paper we present a simple but powerful subgraph sampling primitive that is applicable in a variety of computational models including dynamic graph streams (where the input graph is defined by a sequence of edge/hyperedge insertions and deletions) and distributed systems such as MapReduce. In the case of dynamic graph streams, we use this primitive to prove the following results: Matching: Our main result for matchings is that there exists an Õ ( k 2 ) space algorithm that returns the edges of a maximum matching on the assumption the cardinality is at most k. The best previous algorithm used Õ ( kn ) space where n is the number of vertices in the graph and we prove our result is optimal up to logarithmic factors. Our algorithm has Õ (1) update time. We also show that there exists an Õ ( n 2 / α 3 ) space algorithm that returns an α -approximation for matchings of arbitrary size. In independent work, Assadi et al. (SODA 2016) proved this approximation algorithm is optimal and provided an alternative algorithm. We generalize our exact and approximate algorithms to weighted matching. For graphs with low arboricity such as planar graphs, the space required for constant approximation can be further reduced. While there has been a substantial amount of work on approximate matching in insert-only graph streams, these are the first nontrivial results in the dynamic setting. Vertex Cover and Hitting Set: There exists an Õ ( k d ) space algorithm that solves the minimum hitting set problem where d is the cardinality of the input sets and k is an upper bound on the size of the minimum hitting set. We prove this is optimal up to logarithmic factors. Our algorithm has Õ (1) update time. The case d = 2 corresponds to minimum vertex cover. Finally, we consider a larger family of parameterized problems (including b -matching, disjoint paths, vertex coloring among others) for which our subgraph sampling primitive yields fast, small-space dynamic graph stream algorithms. We then show lower bounds for natural problems outside this family.

SODA Conference 2015 Conference Paper

Streaming Algorithms for Estimating the Matching Size in Planar Graphs and Beyond

  • Hossein Esfandiari
  • MohammadTaghi Hajiaghayi
  • Vahid Liaghat
  • Morteza Monemizadeh
  • Krzysztof Onak

We consider the problem of estimating the size of a maximum matching when the edges are revealed in a streaming fashion. When the input graph is planar, we present a simple and elegant streaming algorithm that with high probability estimates the size of a maximum matching within a constant factor using space, where n is the number of vertices. The approach generalizes to the family of graphs that have bounded arboricity, which include graphs with an excluded constant-size minor. To the best of our knowledge, this is the first result for estimating the size of a maximum matching in the adversarial-order streaming model (as opposed to the random-order streaming model) in o(n) space. We circumvent the barriers inherent in the adversarial-order model by exploiting several structural properties of planar graphs, and more generally, graphs with bounded arboricity. We further reduce the required memory size to for three restricted settings: (i) when the input graph is a forest; (ii) when we have 2-passes and the input graph has bounded arboricity; and (iii) when the edges arrive in random order and the input graph has bounded arboricity. Finally, we design a reduction from the Boolean Hidden Matching Problem to show that there is no randomized streaming algorithm that estimates the size of the maximum matching to within a factor better than 3/2 and uses only o ( n 1/2 ) bits of space. Using the same reduction, we show that there is no deterministic algorithm that computes this kind of estimate in o ( n ) bits of space. The lower bounds hold even for graphs that are collections of paths of constant length.