Arrow Research search

Author name cluster

Diptarka Chakraborty

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

AAAI Conference 2026 Conference Paper

Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications

  • Diptarka Chakraborty
  • Kushagra Chatterjee
  • Debarati Das
  • Tien-Long Nguyen

Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. A recent work initiated the study of closest fair clustering, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc. In this work, we generalize the study of the closest fair clustering problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups. Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the fair correlation clustering problem, advancing the state-of-the-art results. Additionally, we are the first to provide approximation algorithms for the fair consensus clustering problem involving multiple (more than two) groups.

IJCAI Conference 2025 Conference Paper

Improved Rank Aggregation Under Fairness Constraint

  • Diptarka Chakraborty
  • Himika Das
  • Sanjana Dey
  • Alvin Hong Yao Yan

Aggregating multiple input rankings into a consensus ranking is essential in various fields such as social choice theory, hiring, college admissions, web search, and databases. A major challenge is that the optimal consensus ranking might be biased against individual candidates or groups, especially those from marginalized communities. This concern has led to recent studies focusing on fairness in rank aggregation. The goal is to ensure that candidates from different groups are fairly represented in the top-k positions of the aggregated ranking. We study this fair rank aggregation problem by considering the Kendall tau as the underlying metric. While we know of a polynomial-time approximation scheme (PTAS) for the classical rank aggregation problem, the corresponding fair variant only possesses a quite straightforward 3-approximation algorithm due to Wei et al. , SIGMOD'22, and Chakraborty et al. , NeurIPS'22, which finds closest fair ranking for each input ranking and then simply outputs the best one. In this paper, we first provide a novel algorithm that achieves (2+ε)-approximation (for any ε > 0), significantly improving over the 3-approximation bound. Next, we provide a 2. 881-approximation fair rank aggregation algorithm that works irrespective of the fairness notion, given one can find a closest fair ranking, beating the 3-approximation bound. We complement our theoretical guarantee by performing extensive experiments on various real-world datasets to establish the effectiveness of our algorithm further by comparing it with the performance of state-of-the-art algorithms.

MFCS Conference 2023 Conference Paper

Support Size Estimation: The Power of Conditioning

  • Diptarka Chakraborty
  • Gunjan Kumar
  • Kuldeep S. Meel

We consider the problem of estimating the support size of a distribution D. Our investigations are pursued through the lens of distribution testing and seek to understand the power of conditional sampling (denoted as COND), wherein one is allowed to query the given distribution conditioned on an arbitrary subset S. The primary contribution of this work is to introduce a new approach to lower bounds for the COND model that relies on using powerful tools from information theory and communication complexity. Our approach allows us to obtain surprisingly strong lower bounds for the COND model and its extensions. - We bridge the longstanding gap between the upper bound O(log log n + 1/ε²) and the lower bound Ω(√{log log n}) for the COND model by providing a nearly matching lower bound. Surprisingly, we show that even if we get to know the actual probabilities along with COND samples, still Ω(log log n + 1/{ε² log (1/ε)}) queries are necessary. - We obtain the first non-trivial lower bound for the COND equipped with an additional oracle that reveals the actual as well as the conditional probabilities of the samples (to the best of our knowledge, this subsumes all of the models previously studied): in particular, we demonstrate that Ω(log log log n + 1/{ε² log (1/ε)}) queries are necessary.

NeurIPS Conference 2022 Conference Paper

Fair Rank Aggregation

  • Diptarka Chakraborty
  • Syamantak Das
  • Arindam Khan
  • Aditya Subramanian

Ranking algorithms find extensive usage in diverse areas such as web search, employment, college admission, voting, etc. The related rank aggregation problem deals with combining multiple rankings into a single aggregate ranking. However, algorithms for both these problems might be biased against some individuals or groups due to implicit prejudice or marginalization in the historical data. We study ranking and rank aggregation problems from a fairness or diversity perspective, where the candidates (to be ranked) may belong to different groups and each group should have a fair representation in the final ranking. We allow the designer to set the parameters that define fair representation. These parameters specify the allowed range of the number of candidates from a particular group in the top-$k$ positions of the ranking. Given any ranking, we provide a fast and exact algorithm for finding the closest fair ranking for the Kendall tau metric under {\em strong fairness}, i. e. , when the final ranking is fair for all values of $k$. We also provide an exact algorithm for finding the closest fair ranking for the Ulam metric under strong fairness when there are only $O(1)$ number of groups. Our algorithms are simple, fast, and might be extendable to other relevant metrics. We also give a novel meta-algorithm for the general rank aggregation problem under the fairness framework. Surprisingly, this meta-algorithm works for any generalized mean objective (including center and median problems) and any fairness criteria. As a byproduct, we obtain 3-approximation algorithms for both center and median problems, under both Kendall tau and Ulam metrics. Furthermore, using sophisticated techniques we obtain a $(3-\varepsilon)$-approximation algorithm, for a constant $\varepsilon>0$, for the Ulam metric under strong fairness.

SODA Conference 2021 Conference Paper

Approximating the Median under the Ulam Metric

  • Diptarka Chakraborty
  • Debarati Das 0001
  • Robert Krauthgamer

We study approximation algorithms for variants of the median string problem, which asks for a string that minimizes the sum of edit distances from a given set of m strings of length n. Only the straightforward 2-approximation is known for this NP-hard problem. This problem is motivated e. g. by computational biology, and belongs to the class of median problems (over different metric spaces), which are fundamental tasks in data analysis. Our main result is for the Ulam metric, where all strings are permutations over [ n ] and each edit operation moves a symbol (deletion plus insertion). We devise for this problem an algorithms that breaks the 2-approximation barrier, i. e. , computes a (2 – δ )-approximate median permutation for some constant δ > 0 in time Õ ( nm 2 + n 3 ). We further use these techniques to achieve a (2 – δ ) approximation for the median string problem in the special case where the median is restricted to length n and the optimal objective is large Ω( mn ). We also design an approximation algorithm for the following probabilistic model of the Ulam median: the input consists of m perturbations of an (unknown) permutation x, each generated by moving every symbol to a random position with probability (a parameter) ∊ > 0. Our algorithm computes with high probability a (1 + o (1/ ∊ ))-approximate median permutation in time O ( mn 2 + n 3 ).

FOCS Conference 2018 Conference Paper

Approximating Edit Distance within Constant Factor in Truly Sub-Quadratic Time

  • Diptarka Chakraborty
  • Debarati Das 0001
  • Elazar Goldenberg
  • Michal Koucký 0001
  • Michael E. Saks

Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within approximation factor poly(log n). In this paper, we provide an algorithm with running time Õ(n^2-2/7) that approximates the edit distance within a constant factor.