Arrow Research search

Author name cluster

Samir Khuller

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

AAMAS Conference 2024 Conference Paper

Fair Allocation of Conflicting Courses under Additive Utilities

  • Arpita Biswas
  • Yiduo Ke
  • Samir Khuller
  • Quanquan C. Liu

We investigate the problem of fair allocation of indivisible items when certain item pairs conflict, where conflicts are represented by an interval graph. In this setting, no two conflicting items may be allocated to the same agent. Our problem has practical applications specifically for course allocation where students are agents and course seats are items; courses may have conflicting schedules. We devise algorithms for finding fair, specifically envy-freeness up to one item (EF1), allocations of courses to students in the most general setting: when students have non-uniform, non-identical, additive utility functions. In this extended abstract, we provide one of the algorithms that finds a EF1 solution under identical utilities, implying that, for any course, all students have the same utility.

ICML Conference 2022 Conference Paper

Individual Preference Stability for Clustering

  • Saba Ahmadi
  • Pranjal Awasthi
  • Samir Khuller
  • Matthäus Kleindessner
  • Jamie Morgenstern
  • Pattara Sukprasert
  • Ali Vakilian

In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. Our notion can be motivated from several perspectives, including game theory and algorithmic fairness. We study several questions related to our proposed notion. We first show that deciding whether a given data set allows for an IP-stable clustering in general is NP-hard. As a result, we explore the design of efficient algorithms for finding IP-stable clusterings in some restricted metric spaces. We present a polytime algorithm to find a clustering satisfying exact IP-stability on the real line, and an efficient algorithm to find an IP-stable 2-clustering for a tree metric. We also consider relaxing the stability constraint, i. e. , every data point should not be too far from its own cluster compared to any other cluster. For this case, we provide polytime algorithms with different guarantees. We evaluate some of our algorithms and several standard clustering approaches on real data sets.

ICML Conference 2020 Conference Paper

A Pairwise Fair and Community-preserving Approach to k-Center Clustering

  • Brian Brubach
  • Darshan Chakrabarti
  • John Dickerson 0001
  • Samir Khuller
  • Aravind Srinivasan
  • Leonidas Tsepenekas

Clustering is a foundational problem in machine learning with numerous applications. As machine learning increases in ubiquity as a backend for automated systems, concerns about fairness arise. Much of the current literature on fairness deals with discrimination against protected classes in supervised learning (group fairness). We define a different notion of fair clustering wherein the probability that two points (or a community of points) become separated is bounded by an increasing function of their pairwise distance (or community diameter). We capture the situation where data points represent people who gain some benefit from being clustered together. Unfairness arises when certain points are deterministically separated, either arbitrarily or by someone who intends to harm them as in the case of gerrymandering election districts. In response, we formally define two new types of fairness in the clustering setting, pairwise fairness and community preservation. To explore the practicality of our fairness goals, we devise an approach for extending existing $k$-center algorithms to satisfy these fairness constraints. Analysis of this approach proves that reasonable approximations can be achieved while maintaining fairness. In experiments, we compare the effectiveness of our approach to classical $k$-center algorithms/heuristics and explore the tradeoff between optimal clustering and fairness.

IJCAI Conference 2020 Conference Paper

An Algorithm for Multi-Attribute Diverse Matching

  • Saba Ahmadi
  • Faez Ahmed
  • John P. Dickerson
  • Mark Fuge
  • Samir Khuller

Bipartite b-matching, where agents on one side of a market are matched to one or more agents or items on the other, is a classical model that is used in myriad application areas such as healthcare, advertising, education, and general resource allocation. Traditionally, the primary goal of such models is to maximize a linear function of the constituent matches (e. g. , linear social welfare maximization) subject to some constraints. Recent work has studied a new goal of balancing whole-match diversity and economic efficiency, where the objective is instead a monotone submodular function over the matching. Basic versions of this problem are solvable in polynomial time. In this work, we prove that the problem of simultaneously maximizing diversity along several features (e. g. , country of citizenship, gender, skills) is NP-hard. To address this problem, we develop the first combinatorial algorithm that constructs provably-optimal diverse b-matchings in pseudo-polynomial time. We also provide a Mixed-Integer Quadratic formulation for the same problem and show that our method guarantees optimal solutions and takes less computation time for a reviewer assignment application. The source code is made available at https: //github. com/faezahmed/diverse_matching.

TCS Journal 2019 Journal Article

Select and permute: An improved online framework for scheduling to minimize weighted completion time

  • Samir Khuller
  • Jingling Li
  • Pascal Sturmfels
  • Kevin Sun
  • Prayaag Venkat

In this paper, we introduce a new online scheduling framework for minimizing total weighted completion time in a general setting. The framework is inspired by the work of Hall et al. (1997) [11] and Garg et al. (2007) [9], who show how to convert an offline approximation to an online scheme. Our framework uses two offline approximation algorithms—one for the simpler problem of scheduling without release times, and another for the minimum unscheduled weight problem—to create an online algorithm with provably good competitive ratios. We illustrate multiple applications of this method that yield improved competitive ratios. Our framework gives algorithms with the best or only-known competitive ratios for the concurrent open shop, coflow, and concurrent cluster models. We also introduce a randomized variant of our framework based on the ideas of Chakrabarti et al. (1996) [3] and use it to achieve improved competitive ratios for these same problems.

SODA Conference 2014 Conference Paper

Analyzing the Optimal Neighborhood: Algorithms for Budgeted and Partial Connected Dominating Set Problems

  • Samir Khuller
  • Manish Purohit
  • Kanthi K. Sarpatwar

We study partial and budgeted versions of the well studied connected dominating set problem. In the partial connected dominating set problem (P cds ), we are given an undirected graph G = ( V, E ) and an integer n′, and the goal is to find a minimum subset of vertices that induces a connected subgraph of G and dominates at least n′ vertices. We obtain the first polynomial time algorithm with an O (lnΔ) approximation factor for this problem, thereby significantly extending the results of Guha and Khuller (Algorithmica, Vol. 20(4), Pages 374–387, 1998) for the connected dominating set problem. We note that none of the methods developed earlier can be applied directly to solve this problem. In the budgeted connected dominating set problem (B cds ), there is a budget on the number of vertices we can select, and the goal is to dominate as many vertices as possible. We obtain a approximation algorithm for this problem. Finally, we show that our techniques extend to a more general setting where the profit function associated with a subset of vertices is a “special” submodular function. This generalization captures the connected dominating set problem with capacities and/or weighted profits as special cases. This implies a O (ln q ) approximation (where q denotes the quota) and an O (1) approximation algorithms for the partial and budgeted versions of these problems. While the algorithms are simple, the results make a surprising use of the greedy set cover framework in defining a useful profit function.

FOCS Conference 2012 Conference Paper

LP Rounding for k-Centers with Non-uniform Hard Capacities

  • Marek Cygan
  • MohammadTaghi Hajiaghayi
  • Samir Khuller

In this paper we consider a generalization of the classical k-center problem with capacities. Our goal is to select k centers in a graph, and assign each node to a nearby center, so that we respect the capacity constraints on centers. The objective is to minimize the maximum distance a node has to travel to get to its assigned center. This problem is NP-hard, even when centers have no capacity restrictions and optimal factor 2 approximation algorithms are known. With capacities, when all centers have identical capacities, a 6 approximation is known with no better lower bounds than for the infinite capacity version. While many generalizations and variations of this problem have been studied extensively, no progress was made on the capacitated version for a general capacity function. We develop the first constant factor approximation algorithm for this problem. Our algorithm uses an LP rounding approach to solve this problem, and works for the case of non-uniform hard capacities, when multiple copies of a node may not be chosen and can be extended to the case when there is a hard bound on the number of copies of a node that may be selected. Finally, for non-uniform soft capacities we present a much simpler 11-approximation algorithm, which we find as one more evidence that hard capacities are much harder to deal with.

SODA Conference 2010 Conference Paper

Energy Efficient Scheduling via Partial Shutdown

  • Samir Khuller
  • Jian Li 0015
  • Barna Saha

Motivated by issues of saving energy in data centers we define a collection of new problems referred to as “machine activation” problems. The central framework we introduce considers a collection of m machines (unrelated or related) with each machine i having an activation cost of a i. There is also a collection of n jobs that need to be performed, and p i, j is the processing time of job j on machine i. Standard scheduling models assume that the set of machines is fixed and all machines are available. However, in our setting, we assume that there is an activation cost budget of A – we would like to select a subset S of the machines to activate with total cost a ( S ) ≤ A and find a schedule for the n jobs on the machines in S minimizing the makespan (or any other metric). We consider both the unrelated machines setting, as well as the setting of scheduling uniformly related parallel machines, where machine i has activation cost a i and speed s i, and the processing time of job j on machine i is, where p j is the processing requirement of job j. For the general unrelated machine activation problem, our main results are that if there is a schedule with makespan T and activation cost A then we can obtain a schedule with makespan (2 + ε) T and activation cost, for any ε > 0. We also consider assignment costs for jobs as in the generalized assignment problem, and using our framework, provide algorithms that minimize the machine activation and the assignment cost simultaneously. In addition, we present a greedy algorithm which only works for the basic version and yields a makespan of 2 T and an activation cost A (1 + ln n ). For the uniformly related parallel machine scheduling problem, we develop a polynomial time approximation scheme that outputs a schedule with the property that the activation cost of the subset of machines is at most A and the makespan is at most (1 + ε) T for any ε > 0. For the special case of m identical speed machines, the machine activation problem is trivial, since the cheapest subset of k machines is always the best choice if the optimal solution activates k machines. In addition, we consider the case when some jobs can be dropped (and are treated as outliers).

FOCS Conference 2002 Conference Paper

Dependent Rounding in Bipartite Graphs

  • Rajiv Gandhi
  • Samir Khuller
  • Srinivasan Parthasarathy 0002
  • Aravind Srinivasan

We combine the pipage rounding technique of Ageev & Sviridenko with a recent rounding method developed by Srinivasan (2001), to develop a new randomized rounding approach for fractional vectors defined on the edge-sets of bipartite graphs. We show various ways of combining this technique with other ideas, leading to the following applications: richer random-graph models for graphs with a given degree-sequence; improved approximation algorithms for: (i) throughput-maximization in broadcast scheduling, (ii) delay-minimization in broadcast scheduling, and (iii) capacitated vertex cover; fair scheduling of jobs on unrelated parallel machines. A useful feature of our method is that it lets us prove certain (probabilistic) per-user fairness properties.

FOCS Conference 1995 Conference Paper

The Loading Time Scheduling Problem (Extended Abstract)

  • Randeep Bhatia
  • Samir Khuller
  • Joseph Naor

In this paper we study precedence constrained scheduling problems, where the tasks can only be executed on a specified subset of the machines. Each machine has a loading time that is incurred only for the first task that is scheduled on the machine in a particular run. This basic scheduling problem arises in the context of machining on numerically controlled machines, query optimization in databases, and in other artificial intelligence applications. We give the first non-trivial approximation algorithm for this problem. We also prove non-trivial lower bounds on best possible approximation ratios for these problems. These improve on the non-approximability results that are implied by the non-approximability results for the shortest common supersequence problem. We use the same algorithmic technique to obtain approximation algorithms for a problem arising in the context of code generation for parallel machines, and for the weighted shortest common supersequence problem.

FOCS Conference 1992 Conference Paper

Efficient Minimum Cost Matching Using Quadrangle Inequality

  • Alok Aggarwal
  • Amotz Bar-Noy
  • Samir Khuller
  • Dina Kravets
  • Baruch Schieber

The authors present efficient algorithms for finding a minimum cost perfect matching, and for solving the transportation problem in bipartite graphs, G = (Red union Blue, Red * Blue), where mod Red mod = n, mod Blue mod = m, n >

FOCS Conference 1989 Conference Paper

Efficient Parallel Algorithms for Testing Connectivity and Finding Disjoint s-t Paths in Graphs (Extended Summary)

  • Samir Khuller
  • Baruch Schieber

An efficient parallel algorithm for testing whether a graph G is K-vertex connected, for any fixed k, is presented. The algorithm runs in O(log n) time and uses nC(n, m) processors on a concurrent-read, concurrent-write parallel random-access machine (CRCW PRAM), where n and m are the number of vertices and edges of G and C(n, m) is the number of processors required to compute the connected components of G in logarithmic time. An optimal speedup algorithm for computing connected components would induce an optimal speedup algorithm for testing k-vertex connectivity, for any k>4. To develop the algorithm, an efficient parallel algorithm is designed for the following disjoint s-t paths problem: Given a graph G and two specified vertices s and t, find k-vertex disjoint paths between s and t, if they exist. If no such paths exist, find a set of at most k-1 vertices whose removal disconnects s and t. The parallel algorithm for this problem runs in O(log n) time using C(n, m) processors. It is shown how to modify the algorithm to find k-edge disjoint paths, if they exist. This yields an efficient parallel algorithm for testing whether a graph G is k-edge connected, for any fixed k. The algorithm runs in O(log n) time and uses nC (n, n) processors on a CRCW PRAM. Again, an optimal speedup algorithm for computing connected components would induce an optimal speedup algorithm for testing k-edge connectivity. >

FOCS Conference 1989 Conference Paper

Processor Efficient Parallel Algorithms for the Two Disjoint Paths Problem, and for Finding a Kuratowski Homeomorph

  • Samir Khuller
  • Stephen G. Mitchell
  • Vijay V. Vazirani

Given a graph G and two pairs of vertices s/sub 1/, t/sub 1/ and s/sub 2/, t/sub 2/, the two disjoint paths problem asks for vertex-disjoint paths connecting s/sub i/ with t/sub i/, i=1, 2. A fast parallel (NC) algorithm is given for this problem, which has applications in certain routing situations. If G is nonplanar, an algorithm that finds a Kuratowski homeomorph in G (i. e. a subgraph homeomorphic to K/sub 3. 3/ or K/sub 5/) is given. This complements the known NC planarity algorithms, which give a planar embedding in the positive case; the algorithm provides a certificate of nonplanarity in the negative case. Both algorithms are processor efficient; in each case, the processor-time product is within a polylogarithmic factor of the best known sequential algorithm. >