Arrow Research search

Author name cluster

Ravi Kumar

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers
2 author rows

Possible papers

36

ICML Conference 2025 Conference Paper

LAuReL: Learned Augmented Residual Layer

  • Gaurav Menghani
  • Ravi Kumar
  • Sanjiv Kumar

One of the core pillars of efficient deep learning methods are architectural improvements, such as residual/skip connections, which have led to significantly better model convergence and quality. Since their introduction, residual connections have become ubiquitous not only in convolutional neural networks but also in transformer-based architectures, the backbone of LLMs. In this paper, we introduce the Learned Augmented Residual Layer (LAuReL) — a novel generalization of the canonical residual connection — designed to serve as an in-situ replacement while outperforming it in both model quality and footprint metrics. Our experiments show that LAuReL can enhance quality for both vision and language models while adding fewer parameters and incurring less latency and memory overhead than naively increasing parameter count. For example, on the ImageNet-1K task, LAuReL achieves the same model quality improvements as naively adding an extra layer while using $2. 6 \times$ fewer parameters. Similarly, when pre-training 1B and 4B parameter LLMs, LAuReL improves performance on a variety of challenging downstream evaluation tasks by 2. 54% to 20. 05%, while adding only 0. 012% and 0. 1% additional parameters, respectively.

NeurIPS Conference 2025 Conference Paper

Length Generalization via Auxiliary Tasks

  • Pranjal Awasthi
  • Anupam Gupta
  • Ravi Kumar

Length generalization, the ability of sequence models to generalize to sequences longer than those encountered during training, remains a key challenge for transformers, especially in tasks requiring algorithmic reasoning. Existing theoretical understanding of length generalization is limited, often providing only asymptotic results or focusing on specific problem classes or architectural variants, while empirical approaches frequently rely on ad hoc and often fragile techniques. In this work we introduce a novel framework for analyzing and proving length generalization bounds under specified, verifiable assumptions. A key outcome of the theory is the identification of a natural set of auxiliary tasks, intricately related to the primary task structure, such that strong performance on these auxiliary tasks, alongside the primary task, provably guarantees length generalization within the framework. This motivates a multi-task training procedure that explicitly optimizes performance on both the primary and the identified auxiliary tasks. Empirical evaluations on a variety of synthetic benchmarks known to be challenging for length generalization, including sequence sorting, and reversal, demonstrate that our proposed method yields significant improvements in generalization to substantially longer sequences.

NeurIPS Conference 2025 Conference Paper

Private Hyperparameter Tuning with Ex-Post Guarantee

  • Badih Ghazi
  • Pritish Kamath
  • Alexander Knop
  • Ravi Kumar
  • Pasin Manurangsi
  • Chiyuan Zhang

The conventional approach in differential privacy (DP) literature formulates the privacy-utility tradeoff with a "privacy-first" perspective: for a predetermined level of privacy, a certain utility is achievable. However, practitioners often operate under a "utility-first" paradigm, prioritizing a desired level of utility and then determining the corresponding privacy cost. Wu et al. [2019] initiated a formal study of this ``utility-first'' perspective by introducing ex-post DP. They demonstrated that by adding correlated Laplace noise and progressively reducing it on demand, a sequence of increasingly accurate estimates of a private parameter can be generated, with the privacy cost attributed only to the least noisy iterate released. This led to a Laplace mechanism variant that achieves a specified utility with minimal privacy loss. However, their work, and similar findings by Whitehouse et al. [2023], are primarily limited to simple mechanisms based on Laplace or Gaussian noise. In this paper, we significantly generalize these results. In particular, we extend the findings of Wu et al. [2019] and Liu and Talwar [2019] to support any sequence of private estimators, incurring at most a doubling of the original privacy budget. Furthermore, we demonstrate that hyperparameter tuning for these estimators, including the selection of an optimal privacy budget, can be performed without additional privacy cost. Finally, we extend our results to ex-post R\'{e}nyi DP, further broadening the applicability of utility-first privacy mechanisms.

NeurIPS Conference 2025 Conference Paper

Quantifying Cross-Modality Memorization in Vision-Language Models

  • Yuxin Wen
  • Yangsibo Huang
  • Tom Goldstein
  • Ravi Kumar
  • Badih Ghazi
  • Chiyuan Zhang

Understanding what and how neural networks memorize during training is crucial, both from the perspective of unintentional memorization of potentially sensitive information and from the standpoint of effective knowledge acquisition for real-world, knowledge-intensive tasks. While previous studies primarily investigate memorization within a single modality, such as text memorization in large language models or image memorization in diffusion models, unified multimodal models are becoming increasingly prevalent in practical applications. In this work, we focus on the unique characteristics of cross-modality memorization and conduct a systematic study centered on vision-language models. To facilitate controlled experiments, we first introduce a synthetic persona dataset comprising diverse synthetic person images and textual descriptions. We quantify factual knowledge memorization and cross-modal transferability by training models on a single modality and evaluating their performance in the other. Our results reveal that facts learned in one modality transfer to the other, but a significant gap exists between recalling information in the source and target modalities. Furthermore, we observe that this gap exists across various scenarios, including more capable models, machine unlearning, and the multi-hop case. At the end, we propose a baseline method to mitigate this challenge. We hope our study can inspire future research on developing more robust multimodal learning techniques to enhance cross-modal transferability.

NeurIPS Conference 2025 Conference Paper

Scaling Embedding Layers in Language Models

  • Da Yu
  • Edith Cohen
  • Badih Ghazi
  • Yangsibo Huang
  • Pritish Kamath
  • Ravi Kumar
  • Daogao Liu
  • Chiyuan Zhang

We propose SCONE (**S**calable, **C**ontextualized, **O**ffloaded, **N**-gram **E**mbedding), a new method for extending input embedding layers to enhance language model performance. To avoid increased decoding costs, SCONE retains the original vocabulary while introducing embeddings for a set of frequent $n$-grams. These embeddings provide contextualized representation for each input token and are learned with a separate model during training. After training, embeddings are precomputed and stored in off-accelerator memory; during inference, querying them has minimal impact on latency due to the low complexity of embedding lookups. SCONE enables two new scaling strategies: increasing the number of $n$-gram embeddings and scaling the model used to learn them, both while maintaining fixed accelerator usage during inference (in terms of FLOPS and memory). We show that scaling both aspects enables a model with 1B accelerator-resident parameters to outperform a 1. 9B-parameter baseline across diverse corpora, while using only about half the FLOPS and accelerator memory during inference.

NeurIPS Conference 2024 Conference Paper

Differentially Private Optimization with Sparse Gradients

  • Badih Ghazi
  • Cristóbal Guzmán
  • Pritish Kamath
  • Ravi Kumar
  • Pasin Manurangsi

Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with sparse data, improving upon existing algorithms particularly for the high-dimensional regime. The corresponding lower bounds are based on a novel block-diagonal construction that is combined with existing DP mean estimation lower bounds. Next, we obtain pure- and approximate-DP algorithms with almost optimal rates for stochastic convex optimization with sparse gradients; the former represents the first nearly dimension-independent rates for this problem. Furthermore, by introducing novel analyses of bias reduction in mean estimation and randomly-stopped biased SGD we obtain nearly dimension-independent rates for near-stationary points for the empirical risk in nonconvex settings under approximate-DP.

NeurIPS Conference 2024 Conference Paper

Scalable DP-SGD: Shuffling vs. Poisson Subsampling

  • Lynn Chua
  • Badih Ghazi
  • Pritish Kamath
  • Ravi Kumar
  • Pasin Manurangsi
  • Amer Sinha
  • Chiyuan Zhang

We provide new lower bounds on the privacy guarantee of multi-epoch Adaptive Batch Linear Queries (ABLQ) mechanism with shuffled batch sampling, demonstrating substantial gaps when compared to Poisson subsampling; prior analysis was limited to a single epoch. Since the privacy analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) is obtained by analyzing the ABLQ mechanism, this brings into serious question the common practice of implementing Shuffling based DP-SGD, but reporting privacy parameters as if Poisson subsampling was used. To understand the impact of this gap on the utility of trained machine learning models, we introduce a novel practical approach to implement Poisson subsampling at scale using massively parallel computation, and efficiently train models with the same. We provide a comparison between the utility of models trained with Poisson subsampling based DP-SGD, and the optimistic estimates of utility when using shuffling, via our new lower bounds on the privacy guarantee of ABLQ with shuffling.

NeurIPS Conference 2024 Conference Paper

Tight Bounds for Learning RUMs from Small Slates

  • Flavio Chierichetti
  • Mirko Giacchini
  • Ravi Kumar
  • Alessandro Panconesi
  • Andrew Tomkins

A Random Utility Model (RUM) is a classical model of user behavior defined by a distribution over $\mathbb{R}^n$. A user, presented with a subset of $\\{1, \ldots, n\\}$, will select the item of the subset with the highest utility, according to a utility vector drawn from the specified distribution. In practical settings, the subset is often of small size, as in the ``ten blue links'' of web search. In this paper, we consider a learning setting with complete information on user choices from subsets of size at most $k$. We show that $k=\Theta(\sqrt{n})$ is both necessary and sufficient to predict the distribution of all user choices with an arbitrarily small, constant error. Based on the upper bound, we obtain new algorithms for approximate RUM learning and variations thereof. Furthermore, we employ our lower bound for approximate RUM learning to derive lower bounds to fractional extensions of the well-studied $k$-deck and trace reconstruction problems.

AAAI Conference 2023 Conference Paper

Differentially Private Heatmaps

  • Badih Ghazi
  • Junfeng He
  • Kai Kohlhoff
  • Ravi Kumar
  • Pasin Manurangsi
  • Vidhya Navalpakkam
  • Nachiappan Valliappan

We consider the task of producing heatmaps from users' aggregated data while protecting their privacy. We give a differentially private (DP) algorithm for this task and demonstrate its advantages over previous algorithms on real-world datasets. Our core algorithmic primitive is a DP procedure that takes in a set of distributions and produces an output that is close in Earth Mover's Distance (EMD) to the average of the inputs. We prove theoretical bounds on the error of our algorithm under a certain sparsity assumption and that these are essentially optimal.

NeurIPS Conference 2023 Conference Paper

On Computing Pairwise Statistics with Local Differential Privacy

  • Badih Ghazi
  • Pritish Kamath
  • Ravi Kumar
  • Pasin Manurangsi
  • Adam Sealfon

We study the problem of computing pairwise statistics, i. e. , ones of the form $\binom{n}{2}^{-1} \sum_{i \ne j} f(x_i, x_j)$, where $x_i$ denotes the input to the $i$th user, with differential privacy (DP) in the local model. This formulation captures important metrics such as Kendall's $\tau$ coefficient, Area Under Curve, Gini's mean difference, Gini's entropy, etc. We give several novel and generic algorithms for the problem, leveraging techniques from DP algorithms for linear queries.

NeurIPS Conference 2023 Conference Paper

On Differentially Private Sampling from Gaussian and Product Distributions

  • Badih Ghazi
  • Xiao Hu
  • Ravi Kumar
  • Pasin Manurangsi

We study the problem, where given a dataset of $n$ i. i. d. samples from an unknown distribution $P$, we seek to generate a sample from a distribution that is close to $P$ in total variation distance, under the constraint of differential privacy. We study the settings where $P$ is a multi-dimensional Gaussian distribution with different assumptions: known covariance, unknown bounded covariance, and unknown unbounded covariance. We present new differentially private sampling algorithms, and show that they achieve near-optimal sample complexity in the first two settings. Moreover, when $P$ is a product distribution on the binary hypercube, we obtain a pure-DP algorithm whereas only an approximate-DP algorithm (with slightly worse sample complexity) was previously known.

NeurIPS Conference 2023 Conference Paper

Optimal Unbiased Randomizers for Regression with Label Differential Privacy

  • Ashwinkumar Badanidiyuru Varadaraja
  • Badih Ghazi
  • Pritish Kamath
  • Ravi Kumar
  • Ethan Leeman
  • Pasin Manurangsi
  • Avinash V Varadarajan
  • Chiyuan Zhang

We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP). In particular, we leverage the trade-offs between bias and variance to construct better label randomizers depending on a privately estimated prior distribution over the labels. We demonstrate that these randomizers achieve state-of-the-art privacy-utility trade-offs on several datasets, highlighting the importance of reducing bias when training neural networks with label DP. We also provide theoretical results shedding light on the structural properties of the optimal unbiased randomizers.

NeurIPS Conference 2023 Conference Paper

Sparsity-Preserving Differentially Private Training of Large Embedding Models

  • Badih Ghazi
  • Yangsibo Huang
  • Pritish Kamath
  • Ravi Kumar
  • Pasin Manurangsi
  • Amer Sinha
  • Chiyuan Zhang

As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during the private training of large embedding models. Our algorithms achieve substantial reductions ($10^6 \times$) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets.

NeurIPS Conference 2023 Conference Paper

User-Level Differential Privacy With Few Examples Per User

  • Badih Ghazi
  • Pritish Kamath
  • Ravi Kumar
  • Pasin Manurangsi
  • Raghu Meka
  • Chiyuan Zhang

Previous work on user-level differential privacy (DP) [Ghazi et al. NeurIPS 2021, Bun et al. STOC 2023] obtained generic algorithms that work for various learning tasks. However, their focus was on the *example-rich* regime, where the users have so many examples that each user could themselves solve the problem. In this work we consider the *example-scarce* regime, where each user has only a few examples, and obtain the following results: * For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm. Roughly speaking, the latter gives a (multiplicative) savings of $O_{\varepsilon, \delta}(\sqrt{m})$ in terms of the number of users required for achieving the same utility, where $m$ is the number of examples per user. This algorithm, while recovering most known bounds for specific problems, also gives new bounds, e. g. , for PAC learning. * For pure-DP, we present a simple technique for adapting the exponential mechanism [McSherry & Talwar, FOCS 2007] to the user-level setting. This gives new bounds for a variety of tasks, such as private PAC learning, hypothesis selection, and distribution learning. For some of these problems, we show that our bounds are near-optimal.

NeurIPS Conference 2022 Conference Paper

Anonymized Histograms in Intermediate Privacy Models

  • Badih Ghazi
  • Pritish Kamath
  • Ravi Kumar
  • Pasin Manurangsi

We study the problem of privately computing the $\mbox{\it anonymized histogram}$ (a. k. a. $\mbox{\it unattributed histogram}$), which is defined as the histogram without item labels. Previous works have provided algorithms with $\ell_1$- and $\ell_2^2$-errors of $O_\varepsilon(\sqrt{n})$ in the central model of differential privacy (DP). In this work, we provide an algorithm with a nearly matching error guarantee of $\widetilde{O}_\varepsilon(\sqrt{n})$ in the shuffle DP and pan-private models. Our algorithm is very simple: it just post-processes the discrete Laplace-noised histogram! Using this algorithm as a subroutine, we show applications in privately estimating symmetric properties of distributions such as entropy, support coverage, and support size.

TCS Journal 2022 Journal Article

On additive approximate submodularity

  • Flavio Chierichetti
  • Anirban Dasgupta
  • Ravi Kumar

A real-valued set function is (additively) approximately submodular if it satisfies the submodularity conditions with an additive error. Approximate submodularity arises in many settings, especially in machine learning, where the function evaluation might not be exact. In this paper we study how close such approximately submodular functions are to truly submodular functions. We show that an approximately submodular function defined on a ground set of n elements is O ( n 2 ) pointwise-close to a submodular function. This result also provides an algorithmic tool that can be used to adapt existing submodular optimization algorithms to approximately submodular functions. To complement, we show an Ω ( n ) lower bound on the distance to submodularity. These results stand in contrast to the case of approximate modularity, where the distance to modularity is a constant, and approximate convexity, where the distance to convexity is logarithmic.

NeurIPS Conference 2022 Conference Paper

Private Isotonic Regression

  • Badih Ghazi
  • Pritish Kamath
  • Ravi Kumar
  • Pasin Manurangsi

In this paper, we consider the problem of differentially private (DP) algorithms for isotonic regression. For the most general problem of isotonic regression over a partially ordered set (poset) $\mathcal{X}$ and for any Lipschitz loss function, we obtain a pure-DP algorithm that, given $n$ input points, has an expected excess empirical risk of roughly $\mathrm{width}(\mathcal{X}) \cdot \log|\mathcal{X}| / n$, where $\mathrm{width}(\mathcal{X})$ is the width of the poset. In contrast, we also obtain a near-matching lower bound of roughly $(\mathrm{width}(\mathcal{X}) + \log |\mathcal{X}|) / n$, that holds even for approximate-DP algorithms. Moreover, we show that the above bounds are essentially the best that can be obtained without utilizing any further structure of the poset. In the special case of a totally ordered set and for $\ell_1$ and $\ell_2^2$ losses, our algorithm can be implemented in near-linear running time; we also provide extensions of this algorithm to the problem of private isotonic regression with additional structural constraints on the output function.

AAAI Conference 2022 Conference Paper

Private Rank Aggregation in Central and Local Models

  • Daniel Alabi
  • Badih Ghazi
  • Ravi Kumar
  • Pasin Manurangsi

In social choice theory, (Kemeny) rank aggregation is a wellstudied problem where the goal is to combine rankings from multiple voters into a single ranking on the same set of items. Since rankings can reveal preferences of voters (which a voter might like to keep private), it is important to aggregate preferences in such a way to preserve privacy. In this work, we present differentially private algorithms for rank aggregation in the pure and approximate settings along with distributionindependent utility upper and lower bounds. In addition to bounds in the central model, we also present utility bounds for the local model of differential privacy.

NeurIPS Conference 2021 Conference Paper

Deep Learning with Label Differential Privacy

  • Badih Ghazi
  • Noah Golowich
  • Ravi Kumar
  • Pasin Manurangsi
  • Chiyuan Zhang

The Randomized Response (RR) algorithm is a classical technique to improve robustness in survey aggregation, and has been widely adopted in applications with differential privacy guarantees. We propose a novel algorithm, Randomized Response with Prior (RRWithPrior), which can provide more accurate results while maintaining the same level of privacy guaranteed by RR. We then apply RRWithPrior to learn neural networks with label differential privacy (LabelDP), and show that when only the label needs to be protected, the model performance can be significantly improved over the previous state-of-the-art private baselines. Moreover, we study different ways to obtain priors, which when used with RRWithPrior can additionally improve the model performance, further reducing the accuracy gap between private and non-private models. We complement the empirical results with theoretical analysis showing that LabelDP is provably easier than protecting both the inputs and labels.

NeurIPS Conference 2021 Conference Paper

Logarithmic Regret from Sublinear Hints

  • Aditya Bhaskara
  • Ashok Cutkosky
  • Ravi Kumar
  • Manish Purohit

We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t \rangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a _hint_ $h_t$ that has non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a regret guarantee of $O(\log T)$, improving on the bound of $\Theta(\sqrt{T})$ in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at _every_ time step. Somewhat surprisingly, we show that an algorithm can obtain $O(\log T)$ regret with just $O(\sqrt{T})$ hints under a natural query model; in contrast, we also show that $o(\sqrt{T})$ hints cannot guarantee better than $\Omega(\sqrt{T})$ regret. We give two applications of our result, to the well-studied setting of {\em optimistic} regret bounds, and to the problem of online learning with abstention.

NeurIPS Conference 2021 Conference Paper

Online Knapsack with Frequency Predictions

  • Sungjin Im
  • Ravi Kumar
  • Mahshid Montazer Qaem
  • Manish Purohit

There has been recent interest in using machine-learned predictions to improve the worst-case guarantees of online algorithms. In this paper we continue this line of work by studying the online knapsack problem, but with very weak predictions: in the form of knowing an upper and lower bound for the number of items of each value. We systematically derive online algorithms that attain the best possible competitive ratio for any fixed prediction; we also extend the results to more general settings such as generalized one-way trading and two-stage online knapsack. Our work shows that even seemingly weak predictions can be utilized effectively to provably improve the performance of online algorithms.

NeurIPS Conference 2021 Conference Paper

User-Level Differentially Private Learning via Correlated Sampling

  • Badih Ghazi
  • Ravi Kumar
  • Pasin Manurangsi

Most works in learning with differential privacy (DP) have focused on the setting where each user has a single sample. In this work, we consider the setting where each user holds $m$ samples and the privacy protection is enforced at the level of each user's data. We show that, in this setting, we may learn with a much fewer number of users. Specifically, we show that, as long as each user receives sufficiently many samples, we can learn any privately learnable class via an $(\epsilon, \delta)$-DP algorithm using only $O(\log(1/\delta)/\epsilon)$ users. For $\epsilon$-DP algorithms, we show that we can learn using only $O_{\epsilon}(d)$ users even in the local model, where $d$ is the probabilistic representation dimension. In both cases, we show a nearly-matching lower bound on the number of users required. A crucial component of our results is a generalization of global stability [Bun, Livni, Moran, FOCS 2020] that allows the use of public randomness. Under this relaxed notion, we employ a correlated sampling strategy to show that the global stability can be boosted to be arbitrarily close to one, at a polynomial expense in the number of samples.

NeurIPS Conference 2020 Conference Paper

Differentially Private Clustering: Tight Approximation Ratios

  • Badih Ghazi
  • Ravi Kumar
  • Pasin Manurangsi

We study the task of differentially private clustering. For several basic clustering problems, including Euclidean DensestBall, 1-Cluster, k-means, and k-median, we give efficient differentially private algorithms that achieve essentially the same approximation ratios as those that can be obtained by any non-private algorithm, while incurring only small additive errors. This improves upon existing efficient algorithms that only achieve some large constant approximation factors. Our results also imply an improved algorithm for the Sample and Aggregate privacy framework. Furthermore, we show that one of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.

NeurIPS Conference 2020 Conference Paper

Fair Hierarchical Clustering

  • Sara Ahmadian
  • Alessandro Epasto
  • Marina Knittel
  • Ravi Kumar
  • Mohammad Mahdian
  • Benjamin Moseley
  • Philip Pham
  • Sergei Vassilvitskii

As machine learning has become more prevalent, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering. In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a specific objective. For various natural objectives, we obtain simple, efficient algorithms to find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.

NeurIPS Conference 2020 Conference Paper

Online Linear Optimization with Many Hints

  • Aditya Bhaskara
  • Ashok Cutkosky
  • Ravi Kumar
  • Manish Purohit

We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ ``hint'' vectors in each round prior to making a decision. In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors. This significantly extends prior work that considered only the case $K=1$. To accomplish this, we develop a way to combine many arbitrary OLO algorithms to obtain regret only a logarithmically worse factor than the minimum regret of the original algorithms in hindsight; this result is of independent interest.

NeurIPS Conference 2019 Conference Paper

Efficient Rematerialization for Deep Networks

  • Ravi Kumar
  • Manish Purohit
  • Zoya Svitkina
  • Erik Vee
  • Joshua Wang

When training complex neural networks, memory usage can be an important bottleneck. The question of when to rematerialize, i. e. , to recompute intermediate values rather than retaining them in memory, becomes critical to achieving the best time and space efficiency. In this work we consider the rematerialization problem and devise efficient algorithms that use structural characterizations of computation graphs---treewidth and pathwidth---to obtain provably efficient rematerialization schedules. Our experiments demonstrate the performance of these algorithms on many common deep learning models.

NeurIPS Conference 2018 Conference Paper

Improving Online Algorithms via ML Predictions

  • Manish Purohit
  • Zoya Svitkina
  • Ravi Kumar

In this work we study the problem of using machine-learned predictions to improve performance of online algorithms. We consider two classical problems, ski rental and non-clairvoyant job scheduling, and obtain new online algorithms that use predictions to make their decisions. These algorithms are oblivious to the performance of the predictor, improve with better predictions, but do not degrade much if the predictions are poor.

NeurIPS Conference 2018 Conference Paper

Mallows Models for Top-k Lists

  • Flavio Chierichetti
  • Anirban Dasgupta
  • Shahrzad Haddadan
  • Ravi Kumar
  • Silvio Lattanzi

The classic Mallows model is a widely-used tool to realize distributions on per- mutations. Motivated by common practical situations, in this paper, we generalize Mallows to model distributions on top-k lists by using a suitable distance measure between top-k lists. Unlike many earlier works, our model is both analytically tractable and computationally efficient. We demonstrate this by studying two basic problems in this model, namely, sampling and reconstruction, from both algorithmic and experimental points of view.

NeurIPS Conference 2017 Conference Paper

Fair Clustering Through Fairlets

  • Flavio Chierichetti
  • Ravi Kumar
  • Silvio Lattanzi
  • Sergei Vassilvitskii

We study the question of fair clustering under the {\em disparate impact} doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the k-center and the k-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions---for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically demonstrate the \emph{price of fairness} by quantifying the value of fair clustering on real-world datasets with sensitive attributes.

NeurIPS Conference 2016 Conference Paper

On Mixtures of Markov Chains

  • Rishi Gupta
  • Ravi Kumar
  • Sergei Vassilvitskii

We study the problem of reconstructing a mixture of Markov chains from the trajectories generated by random walks through the state space. Under mild non-degeneracy conditions, we show that we can uniquely reconstruct the underlying chains by only considering trajectories of length three, which represent triples of states. Our algorithm is spectral in nature, and is easy to implement.

AAAI Conference 2015 Conference Paper

Refer-to-as Relations as Semantic Knowledge

  • Song Feng
  • Sujith Ravi
  • Ravi Kumar
  • Polina Kuznetsova
  • Wei Liu
  • Alexander Berg
  • Tamara Berg
  • Yejin Choi

We study Refer-to-as relations as a new type of semantic knowledge. Compared to the much studied Is-a relation, which concerns factual taxonomic knowledge, Refer-to-as relations aim to address pragmatic semantic knowledge. For example, a “penguin” is a “bird” from a taxonomic point of view, but people rarely refer to a “penguin” as a “bird” in vernacular use. This observation closely relates to the entry-level categorization studied in Psychology. We posit that Refer-toas relations can be learned from data, and that both textual and visual information would be helpful in inferring the relations. By integrating existing lexical structure knowledge with language statistics and visual similarities, we formulate a collective inference approach to map all object names in an encyclopedia to commonly used names for each object. Our contributions include a new labeled data set, the collective inference and optimization approach, and the computed mappings and similarities.

TCS Journal 2014 Journal Article

The complexity of LSH feasibility

  • Flavio Chierichetti
  • Ravi Kumar
  • Mohammad Mahdian

In this paper we study the complexity of the following feasibility problem: given an n × n similarity matrix S as input, is there a locality sensitive hash (LSH) for S? We show that the LSH feasibility problem is NP-hard even in the following strong promise version: either S admits an LSH or S is at ℓ 1 -distance at least n 2 − ϵ from every similarity that admits an LSH. We complement this hardness result by providing an O ˜ ( 3 n ) algorithm for the LSH feasibility problem, which improves upon the naïve n Θ ( n ) time algorithm; we prove that this running time is tight, modulo constants, under the Exponential Time Hypothesis.

NeurIPS Conference 2012 Conference Paper

Selecting Diverse Features via Spectral Regularization

  • Abhimanyu Das
  • Anirban Dasgupta
  • Ravi Kumar

We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized approximately by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and $\ell_1$-regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations.

TCS Journal 2011 Journal Article

Sorting and selection on dynamic data

  • Aris Anagnostopoulos
  • Ravi Kumar
  • Mohammad Mahdian
  • Eli Upfal

We formulate and study a new computational model for dynamic data. In this model, the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint that it only has limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance close to the optimal in expectation and with high probability.

NeurIPS Conference 2008 Conference Paper

Mortal Multi-Armed Bandits

  • Deepayan Chakrabarti
  • Ravi Kumar
  • Filip Radlinski
  • Eli Upfal

We formulate and study a new variant of the $k$-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard $k$-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified with near-certainty. The main motivation for our setting is online-advertising, where ads have limited lifetime due to, for example, the nature of their content and their campaign budget. An algorithm needs to choose among a large collection of ads, more than can be fully explored within the ads' lifetime. We present an optimal algorithm for the state-aware (deterministic reward function) case, and build on this technique to obtain an algorithm for the state-oblivious (stochastic reward function) case. Empirical studies on various reward distributions, including one derived from a real-world ad serving application, show that the proposed algorithms significantly outperform the standard multi-armed bandit approaches applied to these settings.