Author name cluster

Giulia DeSalvo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility

Matthew Fahrbach
Srikumar Ramalingam
Morteza Zadimoghaddam
Sara Ahmadian
Gui Citovsky
Giulia DeSalvo

This work studies a novel subset selection problem called *max-min diversification with monotone submodular utility* (MDMS), which has a wide range of applications in machine learning, e. g. , data sampling and feature selection. Given a set of points in a metric space, the goal of MDMS is to maximize $f(S) = g(S) + \lambda \cdot \text{div}(S)$ subject to a cardinality constraint $|S| \le k$, where $g(S)$ is a monotone submodular function and $\text{div}(S) = \min_{u, v \in S: u \ne v} \text{dist}(u, v)$ is the *max-min diversity* objective. We propose the `GIST` algorithm, which gives a $\frac{1}{2}$-approximation guarantee for MDMS by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove that it is NP-hard to approximate within a factor of $0. 5584$. Finally, we show in our empirical study that `GIST` outperforms state-of-the-art benchmarks for a single-shot data sampling task on ImageNet.

PDF Details

ICLR Conference 2023 Conference Paper

Leveraging Importance Weights in Subset Selection

Gui Citovsky
Giulia DeSalvo
Sanjiv Kumar
Srikumar Ramalingam
Afshin Rostamizadeh
Yunjuan Wang

We present a subset selection algorithm designed to work with arbitrary model families in a practical batch setting. In such a setting, an algorithm can sample examples one at a time but, in order to limit overhead costs, is only able to update its state (i.e. further train model weights) once a large enough batch of examples is selected. Our algorithm, IWeS, selects examples by importance sampling where the sampling probability assigned to each example is based on the entropy of models trained on previously selected batches. IWeS admits significant performance improvement compared to other subset selection algorithms for seven publicly available datasets. Additionally, it is competitive in an active learning setting, where the label information is not available at selection time. We also provide an initial theoretical analysis to support our importance weighting approach, proving generalization and sampling rate bounds.

Details

NeurIPS Conference 2021 Conference Paper

Batch Active Learning at Scale

Gui Citovsky
Giulia DeSalvo
Claudio Gentile
Lazaros Karydas
Anand Rajagopalan
Afshin Rostamizadeh
Sanjiv Kumar

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch -- a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to recent baselines. Finally, we provide an initial theoretical analysis, proving label complexity guarantees for a related sampling method, which we show is approximately equivalent to our sampling method in specific settings.

PDF Details

NeurIPS Conference 2021 Conference Paper

Learning with Labeling Induced Abstentions

Kareem Amin
Giulia DeSalvo
Afshin Rostamizadeh

Consider a setting where we wish to automate an expensive task with a machine learning algorithm using a limited labeling resource. In such settings, examples routed for labeling are often out of scope for the machine learning algorithm. For example, in a spam detection setting, human reviewers not only provide labeled data but are such high-quality detectors of spam that examples routed to them no longer require machine evaluation. As a consequence, the distribution of examples routed to the machine is intimately tied to the process generating labels. We introduce a formalization of this setting, and give an algorithm that simultaneously learns a model and decides when to request a label by leveraging ideas from both the abstention and active learning literatures. We prove an upper bound on the algorithm's label complexity and a matching lower bound for any algorithm in this setting. We conduct a thorough set of experiments including an ablation study to test different components of our algorithm. We demonstrate the effectiveness of an efficient version of our algorithm over margin sampling on a variety of datasets.

PDF Details

NeurIPS Conference 2021 Conference Paper

Online Active Learning with Surrogate Loss Functions

Giulia DeSalvo
Claudio Gentile
Tobias Sommer Thune

We derive a novel active learning algorithm in the streaming setting for binary classification tasks. The algorithm leverages weak labels to minimize the number of label requests, and trains a model to optimize a surrogate loss on a resulting set of labeled and weak-labeled points. Our algorithm jointly admits two crucial properties: theoretical guarantees in the general agnostic setting and a strong empirical performance. Our theoretical analysis shows that the algorithm attains favorable generalization and label complexity bounds, while our empirical study on 18 real-world datasets demonstrate that the algorithm outperforms standard baselines, including the Margin Algorithm, or Uncertainty Sampling, a high-performing active learning algorithm favored by practitioners.

PDF Details

ICML Conference 2020 Conference Paper

Adaptive Region-Based Active Learning

Corinna Cortes
Giulia DeSalvo
Claudio Gentile
Mehryar Mohri
Ningshan Zhang

We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, while actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and non-adaptive region-based active learning baselines.

Details

ICML Conference 2020 Conference Paper

Online Learning with Dependent Stochastic Feedback Graphs

Corinna Cortes
Giulia DeSalvo
Claudio Gentile
Mehryar Mohri
Ningshan Zhang

A general framework for online learning with partial information is one where feedback graphs specify which losses can be observed by the learner. We study a challenging scenario where feedback graphs vary stochastically with time and, more importantly, where graphs and losses are dependent. This scenario appears in several real-world applications that we describe where the outcome of actions are correlated. We devise a new algorithm for this setting that exploits the stochastic properties of the graphs and that benefits from favorable regret guarantees. We present a detailed theoretical analysis of this algorithm, and also report the result of a series of experiments on real-world datasets, which show that our algorithm outperforms standard baselines for online learning with feedback graphs.

Details

ICML Conference 2019 Conference Paper

Active Learning with Disagreement Graphs

Corinna Cortes
Giulia DeSalvo
Mehryar Mohri
Ningshan Zhang
Claudio Gentile

We present two novel enhancements of an online importance-weighted active learning algorithm IWAL, using the properties of disagreements among hypotheses. The first enhancement, IWALD, prunes the hypothesis set with a more aggressive strategy based on the disagreement graph. We show that IWAL-D improves the generalization performance and the label complexity of the original IWAL, and quantify the improvement in terms of the disagreement graph coefficient. The second enhancement, IZOOM, further improves IWAL-D by adaptively zooming into the current version space and thus reducing the best-in-class error. We show that IZOOM admits favorable theoretical guarantees with the changing hypothesis set. We report experimental results on multiple datasets and demonstrate that the proposed algorithms achieve better test performances than IWAL given the same amount of labeling budget.

Details

ICML Conference 2019 Conference Paper

Online Learning with Sleeping Experts and Feedback Graphs

Corinna Cortes
Giulia DeSalvo
Claudio Gentile
Mehryar Mohri
Scott Yang

We consider the scenario of online learning with sleeping experts, where not all experts are available at each round, and analyze the general framework of learning with feedback graphs, where the loss observations associated with each expert are characterized by a graph. A critical assumption in this framework is that the loss observations and the set of sleeping experts at each round are independent. We first extend the classical sleeping experts algorithm of Kleinberg et al. 2008 to the feedback graphs scenario, and prove matching upper and lower bounds for the sleeping regret of the resulting algorithm under the independence assumption. Our main contribution is then to relax this assumption, present a more general notion of sleeping regret, and derive a general algorithm with strong theoretical guarantees. We apply this new framework to the important scenario of online learning with abstention, where a learner can elect to abstain from making a prediction at the price of a certain cost. We empirically validate our algorithm against multiple online abstention algorithms on several real-world datasets, showing substantial performance improvements.

Details

JMLR Journal 2018 Journal Article

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Lisha Li
Kevin Jamieson
Giulia DeSalvo
Afshin Rostamizadeh
Ameet Talwalkar

Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search through adaptive resource allocation and early-stopping. We formulate hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce a novel algorithm, Ã¸uralg, for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Ã¸uralg with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Ã¸uralg can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

PDF Details

ICML Conference 2018 Conference Paper

Online Learning with Abstention

Corinna Cortes
Giulia DeSalvo
Claudio Gentile
Mehryar Mohri
Scott Yang

We present an extensive study of a key problem in online learning where the learner can opt to abstain from making a prediction, at a certain cost. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a natural extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as other standard baselines.

Details

NeurIPS Conference 2016 Conference Paper

Boosting with Abstention

Corinna Cortes
Giulia DeSalvo
Mehryar Mohri

We present a new boosting algorithm for the key scenario of binary classification with abstention where the algorithm can abstain from predicting the label of a point, at the price of a fixed cost. At each round, our algorithm selects a pair of functions, a base predictor and a base abstention function. We define convex upper bounds for the natural loss function associated to this problem, which we prove to be calibrated with respect to the Bayes solution. Our algorithm benefits from general margin-based learning guarantees which we derive for ensembles of pairs of base predictor and abstention functions, in terms of the Rademacher complexities of the corresponding function classes. We give convergence guarantees for our algorithm along with a linear-time weak-learning algorithm for abstention stumps. We also report the results of several experiments suggesting that our algorithm provides a significant improvement in practice over two confidence-based algorithms.

PDF Details

AAAI Conference 2016 Conference Paper

Random Composite Forests

Giulia DeSalvo
Mehryar Mohri

We introduce a broad family of decision trees, Composite Trees, whose leaf classiﬁers are selected out of a hypothesis set composed of p subfamilies with different complexities. We prove new data-dependent learning guarantees for this family in the multi-class setting. These learning bounds provide a quantitative guidance for the choice of the hypotheses at each leaf. Remarkably, they depend on the Rademacher complexities of the sub-families of the predictors and the fraction of sample points correctly classiﬁed at each leaf. We further introduce random composite trees and derive learning guarantees for random composite trees which also apply to Random Forests. Using our theoretical analysis, we devise a new algorithm, RANDOMCOMPOSITEFOREST (RCF), that is based on forming an ensemble of random composite trees. We report the results of experiments demonstrating that RCF yields signiﬁcant performance improvements over both Random Forests and a variant of RCF in several tasks.

PDF Details