Author name cluster

Lev Reyzin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

MFCS Conference 2024 Conference Paper

Applications of Littlestone Dimension to Query Learning and to Compression

Hunter Chase
James Freitag
Lev Reyzin

In this paper we give several applications of Littlestone dimension. The first is to the model of [Angluin and Dohrn, 2017], where we extend their results for learning by equivalence queries with random counterexamples. Second, we extend that model to infinite concept classes with an additional source of randomness. Third, we give improved results on the relationship of Littlestone dimension to classes with extended d-compression schemes, proving the analog of a conjecture of [Floyd and Warmuth, 1995] for Littlestone dimension.

Details

AAAI Conference 2021 Conference Paper

Communication-Aware Collaborative Learning

Avrim Blum
Shelby Heinecke
Lev Reyzin

Algorithms for noiseless collaborative PAC learning have been analyzed and optimized in recent years with respect to sample complexity. In this paper, we study collaborative PAC learning with the goal of reducing communication cost at essentially no penalty to the sample complexity. We develop communication efficient collaborative PAC learning algorithms using distributed boosting. We then consider the communication cost of collaborative learning in the presence of classification noise. As an intermediate step, we show how collaborative PAC learning algorithms can be adapted to handle classification noise. With this insight, we develop communication efficient algorithms for collaborative PAC learning robust to classification noise.

PDF Details

JAIR Journal 2020 Journal Article

On the Complexity of Learning a Class Ratio from Unlabeled Data

Benjamin Fish
Lev Reyzin

In the problem of learning a class ratio from unlabeled data, which we call CR learning, the training data is unlabeled, and only the ratios, or proportions, of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is applicable to a wide variety of settings, including predicting the number of votes for candidates in political elections from polls. In this paper, we formally define this class and resolve foundational questions regarding the computational complexity of CR learning and characterize its relationship to PAC learning. Among our results, we show, perhaps surprisingly, that for finite VC classes what can be efficiently CR learned is a strict subset of what can be learned efficiently in PAC, under standard complexity assumptions. We also show that there exist classes of functions whose CR learnability is independent of ZFC, the standard set theoretic axioms. This implies that CR learning cannot be easily characterized (like PAC by VC dimension).

PDF Details DOI

IJCAI Conference 2017 Conference Paper

On the Complexity of Learning from Label Proportions

Benjamin Fish
Lev Reyzin

In the problem of learning with label proportions (also known as the problem of estimating class ratios), the training data is unlabeled, and only the proportions of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is useful in a wide variety of settings, including predicting the number of votes for candidates in political elections from polls. In this paper, we resolve foundational questions regarding the computational complexity of learning in this setting. We formalize a simple version of the setting, and we compare the computational complexity of learning in this model to classical PAC learning. Perhaps surprisingly, we show that what can be learned efficiently in this model is a strict subset of what may be leaned efficiently in PAC, under standard complexity assumptions. We give a characterization in terms of VC dimension, and we show that there are non-trivial problems in this model that can be efficiently learned. We also give an algorithm that demonstrates the feasibility of learning under well-behaved distributions.

PDF Details

AAMAS Conference 2016 Conference Paper

Recovering Social Networks by Observing Votes

Benjamin Fish
Yi Huang
Lev Reyzin

PDF

AAAI Conference 2015 Conference Paper

Shift-Pessimistic Active Learning Using Robust Bias-Aware Prediction

Anqi Liu
Lev Reyzin
Brian Ziebart

Existing approaches to active learning are generally optimistic about their certainty with respect to data shift between labeled and unlabeled data. They assume that unknown datapoint labels follow the inductive biases of the active learner. As a result, the most useful datapoint labels—ones that refute current inductive biases— are rarely solicited. We propose a shift-pessimistic approach to active learning that assumes the worst-case about the unknown conditional label distribution. This closely aligns model uncertainty with generalization error, enabling more useful label solicitation. We investigate the theoretical benefits of this approach and demonstrate its empirical advantages on probabilistic binary classification tasks.

PDF Details

IJCAI Conference 2015 Conference Paper

Training-Time Optimization of a Budgeted Booster

Yi Huang
Brian Powers
Lev Reyzin

We consider the problem of feature-efficient prediction – a setting where features have costs and the learner is limited by a budget constraint on the total cost of the features it can examine in test time. We focus on solving this problem with boosting by optimizing the choice of base learners in the training phase and stopping the boosting process when the learner’s budget runs out. We experimentally show that our method improves upon the boosting approach AdaBoostRS [Reyzin, 2011] and in many cases also outperforms the recent algorithm SpeedBoost [Grubb and Bagnell, 2012]. We provide a theoretical justication for our optimization method via the margin bound. We also experimentally show that our method outperforms pruned decision trees, a natural budgeted classifier.

PDF Details

TCS Journal 2014 Journal Article

Data stability in clustering: A closer look

Shalev Ben-David
Lev Reyzin

Details DOI

AAAI Conference 2014 Conference Paper

On Boosting Sparse Parities

Lev Reyzin

While boosting has been extensively studied, considerably less attention has been devoted to the task of designing good weak learning algorithms. In this paper we consider the problem of designing weak learners that are especially adept to the boosting procedure and specifically the AdaBoost algorithm. First we describe conditions desirable for a weak learning algorithm. We then propose using sparse parity functions as weak learners, which have many of our desired properties, as weak learners in boosting. Our experimental tests show the proposed weak learners to be competitive with the most widely used ones: decision stumps and pruned decision trees.

PDF Details

STOC Conference 2013 Conference Paper

Statistical algorithms and a lower bound for detecting planted cliques

Vitaly Feldman
Elena Grigorescu
Lev Reyzin
Santosh S. Vempala
Ying Xiao 0003

We introduce a framework for proving lower bounds on computational problems over distributions, based on a class of algorithms called statistical algorithms . For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution, rather than directly accessing samples. Most natural algorithms of interest in theory and in practice, e.g., moments-based methods, local search, standard iterative methods for convex optimization, MCMC and simulated annealing, are statistical algorithms or have statistical counterparts. Our framework is inspired by and generalize the statistical query model in learning theory [34]. Our main application is a nearly optimal lower bound on the complexity of any statistical algorithm for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size O(n 1/2-δ ) for any constant δ > 0. Variants of these problems have been assumed to be hard to prove hardness for other problems and for cryptographic applications. Our lower bounds provide concrete evidence of hardness, thus supporting these assumptions.

Details

ICML Conference 2011 Conference Paper

Boosting on a Budget: Sampling for Feature-Efficient Prediction

Lev Reyzin

Details

UAI Conference 2011 Conference Paper

Efficient Optimal Learning for Contextual Bandits

Miroslav Dudík
Daniel J. Hsu
Satyen Kale
Nikos Karampatziakis
John Langford 0001
Lev Reyzin
Tong Zhang 0001

Details

NeurIPS Conference 2010 Conference Paper

Non-Stochastic Bandit Slate Problems

Satyen Kale
Lev Reyzin
Robert Schapire

We consider bandit problems, motivated by applications in online advertising and news story selection, in which the learner must repeatedly select a slate, that is, a subset of size s from K possible actions, and then receives rewards for just the selected actions. The goal is to minimize the regret with respect to total reward of the best slate computed in hindsight. We consider unordered and ordered versions of the problem, and give efficient algorithms which have regret O(sqrt(T)), where the constant depends on the specific nature of the problem. We also consider versions of the problem where we have access to a number of policies which make recommendations for slates in every round, and give algorithms with O(sqrt(T)) regret for competing with the best such policy as well. We make use of the technique of relative entropy projections combined with the usual multiplicative weight update algorithm to obtain our algorithms.

PDF Details

TCS Journal 2010 Journal Article

Optimally learning social networks with activations and suppressions

Dana Angluin
James Aspnes
Lev Reyzin

Details DOI

JMLR Journal 2009 Journal Article

Learning Acyclic Probabilistic Circuits Using Test Paths

Dana Angluin
James Aspnes
Jiang Chen
David Eisenstat
Lev Reyzin

We define a model of learning probabilistic acyclic circuits using value injection queries, in which fixed values are assigned to an arbitrary subset of the wires and the value on the single output wire is observed. We adapt the approach of using test paths from the Circuit Builder algorithm (Angluin et al., 2009) to show that there is a polynomial time algorithm that uses value injection queries to learn acyclic Boolean probabilistic circuits of constant fan-in and log depth. We establish upper and lower bounds on the attenuation factor for general and transitively reduced Boolean probabilistic circuits of test paths versus general experiments. We give computational evidence that a polynomial time learning algorithm using general value injection experiments may not do much better than one using test paths. For probabilistic circuits with alphabets of size three or greater, we show that the test path lemmas (Angluin et al., 2009, 2008b) fail utterly. To overcome this obstacle, we introduce function injection queries, in which the values on a wire may be mapped to other values rather than just to themselves or constants, and prove a generalized test path lemma for this case. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

PDF Details

ICML Conference 2006 Conference Paper

How boosting the margin can also boost classifier complexity

Lev Reyzin
Robert E. Schapire

Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon in terms of the margins the classifier achieves on training examples. Later, however, Breiman cast serious doubt on this explanation by introducing a boosting algorithm, arc-gv, that can generate a higher margins distribution than AdaBoost and yet performs worse. In this paper, we take a close look at Breiman's compelling but puzzling results. Although we can reproduce his main finding, we find that the poorer performance of arc-gv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by our experiments and entirely consistent with the margins theory. Thus, we find maximizing the margins is desirable, but not necessarily at the expense of other factors, especially base-classifier complexity.

Details