Arrow Research search

Author name cluster

Claudio Gentile

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

55 papers
2 author rows

Possible papers

55

ICML Conference 2025 Conference Paper

Nearly Optimal Sample Complexity for Learning with Label Proportions

  • Róbert Busa-Fekete
  • Travis Dick
  • Claudio Gentile
  • Haim Kaplan
  • Tomer Koren
  • Uri Stemmer

We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags, and only aggregate label values in each bag are available. Despite the partial observability, the goal is still to achieve small regret at the level of individual examples. We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal. From an algorithmic viewpoint, we rely on carefully designed variants of Empirical Risk Minimization, and Stochastic Gradient Descent algorithms, combined with ad hoc variance reduction techniques. On one hand, our theoretical results improve in important ways on the existing literature on LLP, specifically in the way the sample complexity depends on the bag size. On the other hand, we validate our algorithmic solutions on several datasets, demonstrating improved empirical performance (better accuracy for less samples) against recent baselines.

NeurIPS Conference 2024 Conference Paper

Auditing Privacy Mechanisms via Label Inference Attacks

  • Róbert I. Busa-Fekete
  • Travis Dick
  • Claudio Gentile
  • Andrés M. Medina
  • Adam Smith
  • Marika Swanberg

We propose reconstruction advantage measures to audit label privatization mechanisms. A reconstruction advantage measure quantifies the increase in an attacker's ability to infer the true label of an unlabeled example when provided with a private version of the labels in a dataset (e. g. , aggregate of labels from different users or noisy labels output by randomized response), compared to an attacker that only observes the feature vectors, but may have prior knowledge of the correlation between features and labels. We consider two such auditing measures: one additive, and on multiplicative. These cover previous approaches taken in the literature on empirical auditing and differential privacy. These measures allow us to place a variety of proposed privatization schemes---some differentially private, some not---on the same footing. We analyze these measures theoretically under a distributional model which, we claim, encapsulates reasonable adversarial settings. We also quantify their behavior empirically on real and simulated prediction tasks. Across a range of experimental settings, we find that differentially private schemes dominate or match the privacy-utility tradeoff of more heuristic approaches.

JMLR Journal 2024 Journal Article

Fast Rates in Pool-Based Batch Active Learning

  • Claudio Gentile
  • Zhilei Wang
  • Tong Zhang

We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand (pool-based active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm matches the known minimax rates in a standard statistical learning setting with linear function spaces. Our results also exhibit a mild dependence on the batch size. These initial results are then extended to hold for general function spaces with similar algorithmics. These are the first theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

ICRA Conference 2023 Conference Paper

A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations

  • Sohan Rudra
  • Saksham Goel
  • Anirban Santara
  • Claudio Gentile
  • Laurent Perron
  • Fei Xia
  • Vikas Sindhwani
  • Carolina Parada

Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e. g. , television, fridge, etc.), We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e. g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.

NeurIPS Conference 2023 Conference Paper

Easy Learning from Label Proportions

  • Róbert Busa-Fekete
  • Heejin Choi
  • Travis Dick
  • Claudio Gentile
  • Andres Munoz Medina

We consider the problem of Learning from Label Proportions (LLP), a weakly supervised classification setup where instances are grouped into i. i. d. “bags”, and only the frequency of class labels at each bag is available. Albeit, the objective of the learner is to achieve low task loss at an individual instance level. Here we propose EASYLLP, a flexible and simple-to-implement debiasing approach based on aggregate labels, which operates on arbitrary loss functions. Our technique allows us to accurately estimate the expected loss of an arbitrary model at an individual level. We elucidate the differences between our method and standard methods based on label proportion matching, in terms of applicability and optimality conditions. We showcase the flexibility of our approach compared to alternatives by applying our method to popular learning frameworks, like Empirical Risk Minimization (ERM) and Stochastic Gradient Descent (SGD) with provable guarantees on instance level performance. Finally, we validate our theoretical results on multiple datasets, empirically illustrating the conditions under which our algorithm is expected to perform better or worse than previous LLP approaches

ICML Conference 2022 Conference Paper

Achieving Minimax Rates in Pool-Based Batch Active Learning

  • Claudio Gentile
  • Zhilei Wang
  • Tong Zhang 0001

We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand ( pool-based active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm %operating in the realizable setting for which we prove matches the known minimax rates in standard statistical learning settings. Our results also exhibit a mild dependence on the batch size. These are the first theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario.

NeurIPS Conference 2022 Conference Paper

Best of Both Worlds Model Selection

  • Aldo Pacchiano
  • Christoph Dann
  • Claudio Gentile

We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic (``best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a schedule that keeps the base learner's candidate regret bounds balanced until they are detected to violate their guarantees. We develop careful mis-specification tests specifically designed to blend the above model selection criterion with the ability to leverage the (potentially benign) nature of the environment. We recover the model selection guarantees of the CORRAL algorithm for adversarial environments, but with the additional benefit of achieving high probability regret bounds. More importantly, our model selection results also hold simultaneously in stochastic environments under gap assumptions. These are the first theoretical results that achieve best-of-both world (stochastic and adversarial) guarantees while performing model selection in contextual bandit scenarios.

JMLR Journal 2022 Journal Article

Nonstochastic Bandits with Composite Anonymous Feedback

  • Nicolò Cesa-Bianchi
  • Tommaso Cesari
  • Roberto Colomboni
  • Claudio Gentile
  • Yishay Mansour

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that the transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order $\sqrt{(d+1)KT}$, where $d$ is the maximum delay, $K$ is the number of arms, and $T$ is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting. [abs] [ pdf ][ bib ] &copy JMLR 2022. ( edit, beta )

NeurIPS Conference 2022 Conference Paper

Regret Bounds for Multilabel Classification in Sparse Label Regimes

  • Róbert Busa-Fekete
  • Heejin Choi
  • Krzysztof Dembczynski
  • Claudio Gentile
  • Henry Reeve
  • Balazs Szorenyi

Multi-label classification (MLC) has wide practical importance, but the theoretical understanding of its statistical properties is still limited. As an attempt to fill this gap, we thoroughly study upper and lower regret bounds for two canonical MLC performance measures, Hamming loss and Precision@$\kappa$. We consider two different statistical and algorithmic settings, a non-parametric setting tackled by plug-in classifiers \`a la $k$-nearest neighbors, and a parametric one tackled by empirical risk minimization operating on surrogate loss functions. For both, we analyze the interplay between a natural MLC variant of the low noise assumption, widely studied in binary classification, and the label sparsity, the latter being a natural property of large-scale MLC problems. We show that those conditions are crucial in improving the bounds, but the way they are tangled is not obvious, and also different across the two settings.

NeurIPS Conference 2021 Conference Paper

Batch Active Learning at Scale

  • Gui Citovsky
  • Giulia DeSalvo
  • Claudio Gentile
  • Lazaros Karydas
  • Anand Rajagopalan
  • Afshin Rostamizadeh
  • Sanjiv Kumar

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch -- a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to recent baselines. Finally, we provide an initial theoretical analysis, proving label complexity guarantees for a related sampling method, which we show is approximately equivalent to our sampling method in specific settings.

ICML Conference 2021 Conference Paper

Best Model Identification: A Rested Bandit Formulation

  • Leonardo Cella
  • Massimiliano Pontil
  • Claudio Gentile

We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arms are themselves learning algorithms whose expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learned on the fly. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. We analyze an arm elimination algorithm whose regret vanishes as the time horizon increases. The actual rate of convergence depends in a detailed way on the postulated functional form of the expected losses. We complement our analysis with lower bounds, indicating strengths and limitations of the proposed solution.

ICML Conference 2021 Conference Paper

Dynamic Balancing for Model Selection in Bandits and RL

  • Ashok Cutkosky
  • Christoph Dann
  • Abhimanyu Das
  • Claudio Gentile
  • Aldo Pacchiano
  • Manish Purohit

We propose a framework for model selection by combining base algorithms in stochastic bandits and reinforcement learning. We require a candidate regret bound for each base algorithm that may or may not hold. We select base algorithms to play in each round using a “balancing condition” on the candidate regret bounds. Our approach simultaneously recovers previous worst-case regret bounds, while also obtaining much smaller regret in natural scenarios when some base learners significantly exceed their candidate bounds. Our framework is relevant in many settings, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and tuning confidence parameters of algorithms such as LinUCB. Moreover, unlike recent efforts in model selection for linear stochastic bandits, our approach can be extended to consider adversarial rather than stochastic contexts.

ICML Conference 2021 Conference Paper

Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees

  • Anand Rajagopalan
  • Fabio Vitale
  • Danny Vainstein
  • Gui Citovsky
  • Cecilia Magdalena Procopiuc
  • Claudio Gentile

We investigate the problem of hierarchically clustering data streams containing metric data in R^d. We introduce a desirable invariance property for such algorithms, describe a general family of hyperplane-based methods enjoying this property, and analyze two scalable instances of this general family against recently popularized similarity/dissimilarity-based metrics for hierarchical clustering. We prove a number of new results related to the approximation ratios of these algorithms, improving in various ways over the literature on this subject. Finally, since our algorithms are principled but also very practical, we carry out an experimental comparison on both synthetic and real-world datasets showing competitive results against known baselines.

NeurIPS Conference 2021 Conference Paper

Neural Active Learning with Performance Guarantees

  • Zhilei Wang
  • Pranjal Awasthi
  • Christoph Dann
  • Ayush Sekhari
  • Claudio Gentile

We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop. Since the shape of the label requesting threshold is tightly related to the complexity of the function to be learned, which is a-priori unknown, we also derive a version of the algorithm which is agnostic to any prior knowledge. This algorithm relies on a regret balancing scheme to solve the resulting online model selection problem, and is computationally efficient. We prove joint guarantees on the cumulative regret and number of requested labels which depend on the complexity of the labeling function at hand. In the linear case, these guarantees recover known minimax results of the generalization error as a function of the label complexity in a standard statistical learning setting.

NeurIPS Conference 2021 Conference Paper

Online Active Learning with Surrogate Loss Functions

  • Giulia DeSalvo
  • Claudio Gentile
  • Tobias Sommer Thune

We derive a novel active learning algorithm in the streaming setting for binary classification tasks. The algorithm leverages weak labels to minimize the number of label requests, and trains a model to optimize a surrogate loss on a resulting set of labeled and weak-labeled points. Our algorithm jointly admits two crucial properties: theoretical guarantees in the general agnostic setting and a strong empirical performance. Our theoretical analysis shows that the algorithm attains favorable generalization and label complexity bounds, while our empirical study on 18 real-world datasets demonstrate that the algorithm outperforms standard baselines, including the Margin Algorithm, or Uncertainty Sampling, a high-performing active learning algorithm favored by practitioners.

NeurIPS Conference 2020 Conference Paper

Adapting to Misspecification in Contextual Bandits

  • Dylan J. Foster
  • Claudio Gentile
  • Mehryar Mohri
  • Julian Zimmert

A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation. Algorithms based on modeling rewards have shown strong empirical performance, yet typically require a well-specified model, and can fail when this assumption does not hold. Can we design algorithms that are efficient and flexible, yet degrade gracefully in the face of model misspecification? We introduce a new family of oracle-efficient algorithms for $\varepsilon$-misspecified contextual bandits that adapt to unknown model misspecification---both for finite and infinite action settings. Given access to an \emph{online oracle} for square loss regression, our algorithm attains optimal regret and---in particular---optimal dependence on the misspecification level, with \emph{no prior knowledge}. Specializing to linear contextual bandits with infinite actions in $d$ dimensions, we obtain the first algorithm that achieves the optimal $\bigoht(d\sqrt{T} + \varepsilon\sqrt{d}T)$ regret bound for unknown $\varepsilon$. On a conceptual level, our results are enabled by a new optimization-based perspective on the regression oracle reduction framework of Foster and Rakhlin (2020), which we believe will be useful more broadly.

ICML Conference 2020 Conference Paper

Adaptive Region-Based Active Learning

  • Corinna Cortes
  • Giulia DeSalvo
  • Claudio Gentile
  • Mehryar Mohri
  • Ningshan Zhang

We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, while actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and non-adaptive region-based active learning baselines.

ICML Conference 2020 Conference Paper

Online Learning with Dependent Stochastic Feedback Graphs

  • Corinna Cortes
  • Giulia DeSalvo
  • Claudio Gentile
  • Mehryar Mohri
  • Ningshan Zhang

A general framework for online learning with partial information is one where feedback graphs specify which losses can be observed by the learner. We study a challenging scenario where feedback graphs vary stochastically with time and, more importantly, where graphs and losses are dependent. This scenario appears in several real-world applications that we describe where the outcome of actions are correlated. We devise a new algorithm for this setting that exploits the stochastic properties of the graphs and that benefits from favorable regret guarantees. We present a detailed theoretical analysis of this algorithm, and also report the result of a series of experiments on real-world datasets, which show that our algorithm outperforms standard baselines for online learning with feedback graphs.

ICML Conference 2019 Conference Paper

Active Learning with Disagreement Graphs

  • Corinna Cortes
  • Giulia DeSalvo
  • Mehryar Mohri
  • Ningshan Zhang
  • Claudio Gentile

We present two novel enhancements of an online importance-weighted active learning algorithm IWAL, using the properties of disagreements among hypotheses. The first enhancement, IWALD, prunes the hypothesis set with a more aggressive strategy based on the disagreement graph. We show that IWAL-D improves the generalization performance and the label complexity of the original IWAL, and quantify the improvement in terms of the disagreement graph coefficient. The second enhancement, IZOOM, further improves IWAL-D by adaptively zooming into the current version space and thus reducing the best-in-class error. We show that IZOOM admits favorable theoretical guarantees with the changing hypothesis set. We report experimental results on multiple datasets and demonstrate that the proposed algorithms achieve better test performances than IWAL given the same amount of labeling budget.

JMLR Journal 2019 Journal Article

Delay and Cooperation in Nonstochastic Bandits

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Yishay Mansour

We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than $d$ hops to arrive, where $d$ is a delay parameter. We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with $K$ actions and $N$ agents the average per-agent regret after $T$ rounds is at most of order $\sqrt{\bigl(d+1 + \tfrac{K}{N}\alpha_{\le d}\bigr)(T\ln K)}$, where $\alpha_{\le d}$ is the independence number of the $d$-th power of the communication graph $G$. We then show that for any connected graph, for $d=\sqrt{K}$ the regret bound is $K^{1/4}\sqrt{T}$, strictly better than the minimax regret $\sqrt{KT}$ for noncooperating agents. More informed choices of $d$ lead to bounds which are arbitrarily close to the full information minimax regret $\sqrt{T\ln K}$ when $G$ is dense. When $G$ has sparse components, we show that a variant of Exp3-Coop, allowing agents to choose their parameters according to their centrality in $G$, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

NeurIPS Conference 2019 Conference Paper

Flattening a Hierarchical Clustering through Active Learning

  • Fabio Vitale
  • Anand Rajagopalan
  • Claudio Gentile

We investigate active learning by pairwise similarity over the leaves of trees originating from hierarchical clustering procedures. In the realizable setting, we provide a full characterization of the number of queries needed to achieve perfect reconstruction of the tree cut. In the non-realizable setting, we rely on known important-sampling procedures to obtain regret and query complexity bounds. Our algorithms come with theoretical guarantees on the statistical error and, more importantly, lend themselves to {\em linear-time} implementations in the relevant parameters of the problem. We discuss such implementations, prove running time guarantees for them, and present preliminary experiments on real-world datasets showing the compelling practical performance of our algorithms as compared to both passive learning and simple active learning baselines.

ICML Conference 2019 Conference Paper

Online Learning with Sleeping Experts and Feedback Graphs

  • Corinna Cortes
  • Giulia DeSalvo
  • Claudio Gentile
  • Mehryar Mohri
  • Scott Yang

We consider the scenario of online learning with sleeping experts, where not all experts are available at each round, and analyze the general framework of learning with feedback graphs, where the loss observations associated with each expert are characterized by a graph. A critical assumption in this framework is that the loss observations and the set of sleeping experts at each round are independent. We first extend the classical sleeping experts algorithm of Kleinberg et al. 2008 to the feedback graphs scenario, and prove matching upper and lower bounds for the sleeping regret of the resulting algorithm under the independence assumption. Our main contribution is then to relax this assumption, present a more general notion of sleeping regret, and derive a general algorithm with strong theoretical guarantees. We apply this new framework to the important scenario of online learning with abstention, where a learner can elect to abstain from making a prediction at the price of a certain cost. We empirically validate our algorithm against multiple online abstention algorithms on several real-world datasets, showing substantial performance improvements.

ICML Conference 2018 Conference Paper

Online Learning with Abstention

  • Corinna Cortes
  • Giulia DeSalvo
  • Claudio Gentile
  • Mehryar Mohri
  • Scott Yang

We present an extensive study of a key problem in online learning where the learner can opt to abstain from making a prediction, at a certain cost. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a natural extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as other standard baselines.

NeurIPS Conference 2018 Conference Paper

Online Reciprocal Recommendation with Theoretical Performance Guarantees

  • Fabio Vitale
  • Nikos Parotsidis
  • Claudio Gentile

A reciprocal recommendation problem is one where the goal of learning is not just to predict a user's preference towards a passive item (e. g. , a book), but to recommend the targeted user on one side another user from the other side such that a mutual interest between the two exists. The problem thus is sharply different from the more traditional items-to-users recommendation, since a good match requires meeting the preferences of both users. We initiate a rigorous theoretical investigation of the reciprocal recommendation task in a specific framework of sequential learning. We point out general limitations, formulate reasonable assumptions enabling effective learning and, under these assumptions, we design and analyze a computationally efficient algorithm that uncovers mutual likes at a pace comparable to those achieved by a clairvoyant algorithm knowing all user preferences in advance. Finally, we validate our algorithm against synthetic and real-world datasets, showing improved empirical performance over simple baselines.

NeurIPS Conference 2017 Conference Paper

Boltzmann Exploration Done Right

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Gabor Lugosi
  • Gergely Neu

Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite its widespread use, there is virtually no theoretical understanding about the limitations or the actual benefits of this exploration scheme. Does it drive exploration in a meaningful way? Is it prone to misidentifying the optimal actions or spending too much time exploring the suboptimal ones? What is the right tuning for the learning rate? In this paper, we address several of these questions for the classic setup of stochastic multi-armed bandits. One of our main results is showing that the Boltzmann exploration strategy with any monotone learning-rate sequence will induce suboptimal behavior. As a remedy, we offer a simple non-monotone schedule that guarantees near-optimal performance, albeit only when given prior access to key problem parameters that are typically not available in practical situations (like the time horizon $T$ and the suboptimality gap $\Delta$). More importantly, we propose a novel variant that uses different learning rates for different arms, and achieves a distribution-dependent regret bound of order $\frac{K\log^2 T}{\Delta}$ and a distribution-independent bound of order $\sqrt{KT}\log K$ without requiring such prior knowledge. To demonstrate the flexibility of our technique, we also propose a variant that guarantees the same performance bounds even if the rewards are heavy-tailed.

ICML Conference 2017 Conference Paper

On Context-Dependent Clustering of Bandits

  • Claudio Gentile
  • Shuai Li 0011
  • Purushottam Kar
  • Alexandros Karatzoglou
  • Giovanni Zappella
  • Evans Etrue

We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating user neighborhoods in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference, as well as learning processes in a manner that seamlessly interleaves explore-exploit tradeoffs and collaborative steps. We prove regret bounds for CAB under various data-dependent assumptions which exhibit a crisp dependence on the expected number of clusters over the users, a natural measure of the statistical difficulty of the learning task. Experiments on production and real-world datasets show that CAB offers significantly increased prediction performance against a representative pool of state-of-the-art methods.

JMLR Journal 2014 Journal Article

On Multilabel Classification and Ranking with Bandit Feedback

  • Claudio Gentile
  • Francesco Orabona

We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd- order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show $O(T^{1/2}\log T)$ regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on diverse real- world multilabel data sets, often obtaining comparable performance. [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

ICML Conference 2014 Conference Paper

Online Clustering of Bandits

  • Claudio Gentile
  • Shuai Li
  • Giovanni Zappella

We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation (“bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems.

NeurIPS Conference 2013 Conference Paper

A Gang of Bandits

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Giovanni Zappella

Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, we may want to serve content to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global strategy which allocates a bandit algorithm to each network node (user) and allows it to “share” signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a marked increase in prediction performance obtained by exploiting the network structure.

NeurIPS Conference 2013 Conference Paper

From Bandits to Experts: A Tale of Domination and Independence

  • Noga Alon
  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Yishay Mansour

We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir (2011). Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph. We also show that in the undirected case, the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner.

JMLR Journal 2013 Journal Article

Random Spanning Trees and the Prediction of Weighted Graphs

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Fabio Vitale
  • Giovanni Zappella

We investigate the problem of sequentially predicting the binary labels on the nodes of an arbitrary weighted graph. We show that, under a suitable parametrization of the problem, the optimal number of prediction mistakes can be characterized (up to logarithmic factors) by the cutsize of a random spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph nodes. In deriving our characterization, we obtain a simple randomized algorithm achieving in expectation the optimal mistake bound on any polynomially connected weighted graph. Our algorithm draws a random spanning tree of the original graph and then predicts the nodes of this tree in constant expected amortized time and linear space. Experiments on real-world data sets show that our method compares well to both global (Perceptron) and local (label propagation) methods, while being generally faster in practice. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

SODA Conference 2013 Conference Paper

Regret Minimization for Reserve Prices in Second-Price Auctions

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Yishay Mansour

We show a regret minimization algorithm for setting the reserve price in second-price auctions. We make the assumption that all bidders draw their bids from the same unknown and arbitrary distribution. Our algorithm is computationally efficient, and achieves a regret of, even when the number of bidders is stochastic with a known distribution.

NeurIPS Conference 2012 Conference Paper

A Linear Time Active Learning Algorithm for Link Classification

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Fabio Vitale
  • Giovanni Zappella

We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph $G = (V, E)$ such that $|E|$ is at least order of $|V|^{3/2}$ by querying at most order of $|V|^{3/2}$ edge labels. More generally, we show an algorithm that achieves optimality to within a factor of order $k$ by querying at most order of $|V| + (|V|/k)^{3/2}$ edge labels. The running time of this algorithm is at most of order $|E| + |V|\log|V|$.

NeurIPS Conference 2012 Conference Paper

On Multilabel Classification and Ranking with Partial Feedback

  • Claudio Gentile
  • Francesco Orabona

We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show $O(T^{1/2}\log T)$ regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance.

JMLR Journal 2012 Journal Article

Selective Sampling and Active Learning from Single and Multiple Teachers

  • Ofer Dekel
  • Claudio Gentile
  • Karthik Sridharan

We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efficient statistical active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a real-world Internet search problem. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

TCS Journal 2011 Journal Article

Predicting the labels of an unknown graph via adaptive exploration

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Fabio Vitale

Motivated by a problem of targeted advertising in social networks, we introduce a new model of online learning on labeled graphs where the graph is initially unknown and the algorithm is free to choose which vertex to predict next. For this learning model, we define an appropriate measure of regularity of a graph labeling called the merging degree. In general, the merging degree of a graph is small when its vertices can be partitioned into a few well-separated clusters within which labels are roughly constant. For the special case of binary labeled graphs, the merging degree is a more refined measure than the cutsize. After observing that natural nonadaptive exploration/prediction strategies, like depth-first with majority vote, do not behave satisfactorily on graphs with small merging degree, we introduce an efficiently implementable adaptive strategy whose cumulative loss is controlled by the merging degree. A matching lower bound shows that in the case of binary labels our analysis cannot be improved.

NeurIPS Conference 2011 Conference Paper

See the Tree Through the Lines: The Shazoo Algorithm

  • Fabio Vitale
  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Giovanni Zappella

Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, Shazoo, which is nearly optimal on any weighted tree. Moreover, we show that Shazoo can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that Shazoo performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods.

JMLR Journal 2010 Journal Article

Linear Algorithms for Online Multitask Classification

  • Giovanni Cavallanti
  • Nicoló Cesa-Bianchi
  • Claudio Gentile

We introduce new Perceptron-based algorithms for the online multitask binary classification problem. Under suitable regularity conditions, our algorithms are shown to improve on their baselines by a factor proportional to the number of tasks. We achieve these improvements using various types of regularization that bias our algorithms towards specific notions of task relatedness. More specifically, similarity among tasks is either measured in terms of the geometric closeness of the task reference vectors or as a function of the dimension of their spanned subspace. In addition to adapting to the online setting a mix of known techniques, such as the multitask kernels of Evgeniou et al., our analysis also introduces a matrix-based multitask extension of the p -norm Perceptron, which is used to implement spectral co-regularization. Experiments on real-world data sets complement and support our theoretical findings. [abs] [ pdf ][ bib ] &copy JMLR 2010. ( edit, beta )

ICML Conference 2009 Conference Paper

Robust bounds for classification via selective sampling

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Francesco Orabona

We introduce a new algorithm for binary classification in the selective sampling protocol. Our algorithm uses Regularized Least Squares (RLS) as base classifier, and for this reason it can be efficiently run in any RKHS. Unlike previous margin-based semi-supervised algorithms, our sampling condition hinges on a simultaneous upper bound on bias and variance of the RLS estimate under a simple linear label noise model. This fact allows us to prove performance bounds that hold for an arbitrary sequence of instances. In particular, we show that our sampling strategy approximates the margin of the Bayes optimal classifier to any desired accuracy ε by asking Õ ( d /ε 2 ) queries (in the RKHS case d is replaced by a suitable spectral quantity). While these are the standard rates in the fully supervised i.i.d. case, the best previously known result in our harder setting was Õ ( d 3 /ε 4 ). Preliminary experiments show that some of our algorithms also exhibit a good practical performance.

NeurIPS Conference 2008 Conference Paper

Linear Classification and Selective Sampling Under Low Noise Conditions

  • Giovanni Cavallanti
  • Nicolò Cesa-Bianchi
  • Claudio Gentile

We provide a new analysis of an efficient margin-based algorithm for selective sampling in classification problems. Using the so-called Tsybakov low noise condition to parametrize the instance distribution, we show bounds on the convergence rate to the Bayes risk of both the fully supervised and the selective sampling versions of the basic algorithm. Our analysis reveals that, excluding logarithmic factors, the average risk of the selective sampler converges to the Bayes risk at rate $n^{-(1+\alpha)/(3+\alpha)}$, with labels being sampled at the same rate (here $n$ denotes the sample size, and $\alpha > 0$ is the exponent in the low noise condition). We compare this convergence rate to the rate $n^{-(1+\alpha)/(2+\alpha)}$ achieved by the fully supervised algorithm using all labels. Experiments on textual data reveal that simple variants of the proposed selective sampler perform much better than popular and similarly efficient competitors.

NeurIPS Conference 2007 Conference Paper

On higher-order perceptron algorithms

  • Claudio Gentile
  • Fabio Vitale
  • Cristian Brotto

A new algorithm for on-line learning linear-threshold functions is proposed which efficiently combines second-order statistics about the data with the logarithmic behavior" of multiplicative/dual-norm algorithms. An initial theoretical analysis is provided suggesting that our algorithm might be viewed as a standard Perceptron algorithm operating on a transformed sequence of examples with improved margin properties. We also report on experiments carried out on datasets from diverse domains, with the goal of comparing to known Perceptron algorithms (first-order, second-order, additive, multiplicative). Our learning procedure seems to generalize quite well, and converges faster than the corresponding multiplicative baseline algorithms. "

JMLR Journal 2006 Journal Article

Incremental Algorithms for Hierarchical Classification

  • Nicoló Cesa-Bianchi
  • Claudio Gentile
  • Luca Zaniboni

We study the problem of classifying data in a given taxonomy when classifications associated with multiple and/or partial paths are allowed. We introduce a new algorithm that incrementally learns a linear-threshold classifier for each node of the taxonomy. A hierarchical classification is obtained by evaluating the trained node classifiers in a top-down fashion. To evaluate classifiers in our multipath framework, we define a new hierarchical loss function, the H-loss, capturing the intuition that whenever a classification mistake is made on a node of the taxonomy, then no loss should be charged for any additional mistake occurring in the subtree of that node. Making no assumptions on the mechanism generating the data instances, and assuming a linear noise model for the labels, we bound the H-loss of our on-line algorithm in terms of the H-loss of a reference classifier knowing the true parameters of the label-generating process. We show that, in expectation, the excess cumulative H-loss grows at most logarithmically in the length of the data sequence. Furthermore, our analysis reveals the precise dependence of the rate of convergence on the eigenstructure of the data each node observes. Our theoretical results are complemented by a number of experiments on texual corpora. In these experiments we show that, after only one epoch of training, our algorithm performs much better than Perceptron-based hierarchical classifiers, and reasonably close to a hierarchical support vector machine. [abs] [ pdf ][ bib ] &copy JMLR 2006. ( edit, beta )

JMLR Journal 2006 Journal Article

Worst-Case Analysis of Selective Sampling for Linear Classification

  • Nicoló Cesa-Bianchi
  • Claudio Gentile
  • Luca Zaniboni

A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to ask the label of each new instance to be classified. In this paper, we introduce a general technique for turning linear-threshold classification algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds show that our semi-supervised algorithms can achieve, on average, the same accuracy as that of their fully supervised counterparts, but using fewer labels. Our theoretical results are corroborated by a number of experiments on real-world textual data. The outcome of these experiments is essentially predicted by our theoretical results: Our selective sampling algorithms tend to perform as well as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels. [abs] [ pdf ][ bib ] &copy JMLR 2006. ( edit, beta )

NeurIPS Conference 2005 Conference Paper

Improved risk tail bounds for on-line algorithms

  • Nicolò Cesa-Bianchi
  • Claudio Gentile

We prove the strongest known bound for the risk of hypotheses selected from the ensemble generated by running a learning algorithm incremen(cid: 173) tally on the training data. Our result is based on proof techniques that are remarkably different from the standard risk analysis based on uniform convergence arguments.

NeurIPS Conference 2004 Conference Paper

Incremental Algorithms for Hierarchical Classification

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Andrea Tironi
  • Luca Zaniboni

We study the problem of hierarchical classification when labels corre- sponding to partial and/or multiple paths in the underlying taxonomy are allowed. We introduce a new hierarchical loss function, the H-loss, im- plementing the simple intuition that additional mistakes in the subtree of a mistaken class should not be charged for. Based on a probabilistic data model introduced in earlier work, we derive the Bayes-optimal classifier for the H-loss. We then empirically compare two incremental approx- imations of the Bayes-optimal classifier with a flat SVM classifier and with classifiers obtained by using hierarchical versions of the Perceptron and SVM algorithms. The experiments show that our simplest incremen- tal approximation of the Bayes-optimal classifier performs, after just one training epoch, nearly as well as the hierarchical SVM classifier (which performs best). For the same incremental algorithm we also derive an H-loss bound showing, when data are generated by our probabilistic data model, exponentially fast convergence to the H-loss of the hierarchical classifier based on the true model parameters. 1 Introduction and basic definitions We study the problem of classifying data in a given taxonomy of labels, where the tax- onomy is specified as a tree forest. We assume that every data instance is labelled with a (possibly empty) set of class labels called multilabel, with the only requirement that mul- tilabels including some node i in the taxonony must also include all ancestors of i. Thus, each multilabel corresponds to the union of one or more paths in the forest, where each path must start from a root but it can terminate on an internal node (rather than a leaf). Learning algorithms for hierarchical classification have been investigated in, e. g. , [8, 9, 10, 11, 12, 14, 15, 17, 20]. However, the scenario where labelling includes multiple and partial paths has received very little attention. The analysis in [5], which is mainly theoretical, shows in the multiple and partial path case a 0/1-loss bound for a hierarchical learning algorithm based on regularized least-squares estimates. In this work we extend [5] in several ways. First, we introduce a new hierarchical loss func- tion, the H-loss, which is better suited than the 0/1-loss to analyze hierarchical classification tasks, and we derive the corresponding Bayes-optimal classifier under the parametric data model introduced in [5]. Second, considering various loss functions, including the H-loss, we empirically compare the performance of the following three incremental kernel-based This work was supported in part by the PASCAL Network of Excellence under EC grant no. 506778. This publication only reflects the authors' views. algorithms: 1) a hierarchical version of the classical Perceptron algorithm [16]; 2) an ap- proximation to the Bayes-optimal classifier; 3) a simplified variant of this approximation. Finally, we show that, assuming data are indeed generated according to the parametric model mentioned before, the H-loss of the algorithm in 3) converges to the H-loss of the classifier based on the true model parameters. Our incremental algorithms are based on training linear-threshold classifiers in each node of the taxonomy. A similar approach has been studied in [8], though their model does not consider multiple-path classifications as we do. Incremental algorithms are the main focus of this research, since we strongly believe that they are a key tool for coping with tasks where large quantities of data items are generated and the classification system needs to be frequently adjusted to keep up with new items. However, we found it useful to provide a reference point for our empirical results. Thus we have also included in our experiments the results achieved by nonincremental algorithms. In particular, we have chosen a flat and a hierarchical version of SVM [21, 7, 19], which are known to perform well on the textual datasets considered here. We assume data elements are encoded as real vectors x Rd which we call instances. A multilabel for an instance x is any subset of the set {1, .. ., N } of all labels/classes, including the empty set. We denote the multilabel associated with x by a vector y = (y1, .. ., yN ) {0, 1}N, where i belongs to the multilabel of x if and only if yi = 1. A taxonomy G is a forest whose trees are defined over the set of labels. A multilabel y {0, 1}N is said to respect a taxonomy G if and only if y is the union of one or more paths in G, where each path starts from a root but need not terminate on a leaf. See Figure 1. We assume the data-generating mechanism produces examples (x, y) such that y respects some fixed underlying taxonomy G with N nodes. The set of roots in G is denoted by root(G). We use par(i) to denote the unique parent of node i, anc(i) to denote the set of ancestors of i, and sub(i) to denote the set of nodes in the subtree rooted at i (including i). Finally, given a predicate over a set, we will use {} to denote both the subset of where is true and the indicator function of this subset.

NeurIPS Conference 2004 Conference Paper

Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Luca Zaniboni

We provide a worst-case analysis of selective sampling algorithms for learning linear threshold functions. The algorithms considered in this paper are Perceptron-like algorithms, i. e. , algorithms which can be effi- ciently run in any reproducing kernel Hilbert space. Our algorithms ex- ploit a simple margin-based randomized rule to decide whether to query the current label. We obtain selective sampling algorithms achieving on average the same bounds as those proven for their deterministic coun- terparts, but using much fewer labels. We complement our theoretical findings with an empirical comparison on two text categorization tasks. The outcome of these experiments is largely predicted by our theoreti- cal results: Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels.

NeurIPS Conference 2003 Conference Paper

Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms

  • Claudio Gentile

New feature selection algorithms for linear threshold functions are de- scribed which combine backward elimination with an adaptive regular- ization method. This makes them particularly suitable to the classifica- tion of microarray expression data, where the goal is to obtain accurate rules depending on few genes only. Our algorithms are fast and easy to implement, since they center on an incremental (large margin) algorithm which allows us to avoid linear, quadratic or higher-order programming methods. We report on preliminary experiments with five known DNA microarray datasets. These experiments suggest that multiplicative large margin algorithms tend to outperform additive algorithms (such as SVM) on feature selection tasks.

NeurIPS Conference 2002 Conference Paper

Margin-Based Algorithms for Information Filtering

  • Nicolò Cesa-Bianchi
  • Alex Conconi
  • Claudio Gentile

In this work, we study an information filtering model where the relevance labels associated to a sequence of feature vectors are realizations of an unknown probabilistic linear function. Building on the analysis of a re- stricted version of our model, we derive a general filtering rule based on the margin of a ridge regression estimator. While our rule may observe the label of a vector only by classfying the vector as relevant, experiments on a real-world document filtering problem show that the performance of our rule is close to that of the on-line classifier which is allowed to observe all labels. These empirical results are complemented by a theo- retical analysis where we consider a randomized variant of our rule and prove that its expected number of mistakes is never much larger than that of the optimal filtering rule which knows the hidden linear model.

JMLR Journal 2001 Journal Article

A New Approximate Maximal Margin Classification Algorithm (Kernel Machines Section)

  • Claudio Gentile

A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t. norm p ≥ 2 for a set of linearly separable data. Our algorithm, called ALMA_ p (Approximate Large Margin algorithm w.r.t. norm p ), takes O( (p-1) / (α 2 γ 2 ) ) corrections to separate the data with p -norm margin larger than (1-α)γ, where g is the (normalized) p -norm margin of the data. ALMA_ p avoids quadratic (or higher-order) programming methods. It is very easy to implement and is as fast as on-line algorithms, such as Rosenblatt's Perceptron algorithm. We performed extensive experiments on both real-world and artificial datasets. We compared ALMA_2 (i.e., ALMA_ p with p = 2 ) to standard Support vector Machines (SVM) and to two incremental algorithms: the Perceptron algorithm and Li and Long's ROMMA. The accuracy levels achieved by ALMA_2 are superior to those achieved by the Perceptron algorithm and ROMMA, but slightly inferior to SVM's. On the other hand, ALMA_2 is quite faster and easier to implement than standard SVM training algorithms. When learning sparse target vectors, ALMA_ p with p > 2 largely outperforms Perceptron-like algorithms, such as ALMA_2.

NeurIPS Conference 2001 Conference Paper

On the Generalization Ability of On-Line Learning Algorithms

  • Nicolò Cesa-Bianchi
  • Alex Conconi
  • Claudio Gentile

In this paper we show that on-line algorithms for classification and re- gression can be naturally used to obtain hypotheses with good data- dependent tail bounds on their risk. Our results are proven without re- quiring complicated concentration-of-measure arguments and they hold for arbitrary on-line learning algorithms. Furthermore, when applied to concrete on-line algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.

NeurIPS Conference 2000 Conference Paper

A New Approximate Maximal Margin Classification Algorithm

  • Claudio Gentile

A new incremental learning algorithm is described which approximates the maximal margin hyperplane w. r. t. norm p ~ 2 for a set of linearly separable data. Our algorithm, called ALMAp (Approximate Large Mar- gin algorithm w. r. t. norm p), takes 0 ((P~21; ;2) corrections to sepa(cid: 173) rate the data with p-norm margin larger than (1 - 0: ), ,(, where, ,( is the p-norm margin of the data and X is a bound on the p-norm of the in(cid: 173) stances. ALMAp avoids quadratic (or higher-order) programming meth(cid: 173) ods. It is very easy to implement and is as fast as on-line algorithms, such as Rosenblatt's perceptron. We report on some experiments comparing ALMAp to two incremental algorithms: Perceptron and Li and Long's ROMMA. Our algorithm seems to perform quite better than both. The accuracy levels achieved by ALMAp are slightly inferior to those obtained by Support vector Machines (SVMs). On the other hand, ALMAp is quite faster and easier to implement than standard SVMs training algorithms.

NeurIPS Conference 1998 Conference Paper

Linear Hinge Loss and Average Margin

  • Claudio Gentile
  • Manfred K. Warmuth

We describe a unifying method for proving relative loss bounds for on(cid: 173) line linear threshold classification algorithms, such as the Perceptron and the Winnow algorithms. For classification problems the discrete loss is used, i. e. , the total number of prediction mistakes. We introduce a con(cid: 173) tinuous loss function, called the "linear hinge loss", that can be employed to derive the updates of the algorithms. We first prove bounds w. r. t. the linear hinge loss and then convert them to the discrete loss. We intro(cid: 173) duce a notion of "average margin" of a set of examples. We show how relative loss bounds based on the linear hinge loss can be converted to relative loss bounds i. t. o. the discrete loss using the average margin.