Arrow Research search

Author name cluster

John Langford

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

37 papers
1 author row

Possible papers

37

NeurIPS Conference 2024 Conference Paper

Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

  • Mucong Ding
  • Chenghao Deng
  • Jocelyn Choo
  • Zichu Wu
  • Aakriti Agrawal
  • Avi Schwarzschild
  • Tianyi Zhou
  • Tom Goldstein

Despite the abundance of datasets available for assessing large language models (LLMs), the scarcity of continuous and reliable difficulty labels for individual data points, in most cases, curtails their capacity to benchmark model generalization performance across different levels of complexity. Addressing this limitation, we present Easy2Hard, an innovative collection of 6 benchmark datasets featuring standardized difficulty labels spanning a wide range of domains, such as mathematics and programming problems, chess puzzles, and reasoning questions, providing a much-needed tool for those in demand of a dataset with varying degrees of difficulty for LLM assessment. We estimate the difficulty of individual problems by leveraging the performance data of many human subjects and LLMs on prominent leaderboards. Harnessing the rich human performance data, we employ widely recognized difficulty ranking systems, including the Item Response Theory (IRT) and Glicko-2 models, to uniformly assign difficulty scores to problems. The Easy2Hard datasets distinguish themselves from previous collections by incorporating a significantly higher proportion of challenging problems, presenting a novel and demanding test for state-of-the-art LLMs. Through extensive experiments conducted with six state-of-the-art LLMs on the Easy2Hard datasets, we offer valuable insights into their performance and generalization capabilities across varying degrees of difficulty, setting the stage for future research in LLM generalization.

TMLR Journal 2023 Journal Article

Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

  • Alex Lamb
  • Riashat Islam
  • Yonathan Efroni
  • Aniket Rajiv Didolkar
  • Dipendra Misra
  • Dylan J Foster
  • Lekan P Molu
  • Rajan Chari

In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex and inscrutable dynamics. Is it possible to turn the agent's firehose of sensory information into a minimal latent state that is both necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent Control-Endogenous State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the minimal control-endogenous latent state which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of the control-endogenous latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring a maze alongside other agents, and navigating in the Matterport house simulator.

AAAI Conference 2022 Conference Paper

Better Parameter-Free Stochastic Optimization with ODE Updates for Coin-Betting

  • Keyi Chen
  • John Langford
  • Francesco Orabona

Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance. In practical applications, however, there remains an empirical gap between tuned stochastic gradient descent (SGD) and PFSGD. In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models. The new update is derived through the solution of an Ordinary Differential Equation (ODE) and solved in a closed form. We show empirically that this new parameter-free algorithm outperforms algorithms with the “best default” learning rates and almost matches the performance of finely tuned baselines without anything to tune.

NeurIPS Conference 2022 Conference Paper

Interaction-Grounded Learning with Action-Inclusive Feedback

  • Tengyang Xie
  • Akanksha Saran
  • Dylan J Foster
  • Lekan Molu
  • Ida Momennejad
  • Nan Jiang
  • Paul Mineiro
  • John Langford

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches fail when the feedback vector contains the action, which significantly limits IGL’s success in many potential scenarios such as Brain-computer interface (BCI) or Human-computer interface (HCI) applications. We address this by creating an algorithm and analysis which allows IGL to work even when the feedback vector contains the action, encoded in any fashion. We provide theoretical guarantees and large-scale experiments based on supervised datasets to demonstrate the effectiveness of the new approach.

JMLR Journal 2021 Journal Article

A Contextual Bandit Bake-off

  • Alberto Bietti
  • Alekh Agarwal
  • John Langford

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically optimal and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to empirically evaluate contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. We find that a recent method (Foster et al., 2018) using optimism under uncertainty works the best overall. A surprisingly close second is a simple greedy baseline that only explores implicitly through the diversity of contexts, followed by a variant of Online Cover (Agarwal et al., 2014) which tends to be more conservative but robust to problem specification by design. Along the way, we also evaluate various components of contextual bandit algorithm design such as loss estimators. Overall, this is a thorough study and review of contextual bandit methodology. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

JMLR Journal 2020 Journal Article

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

  • Akshay Krishnamurthy
  • John Langford
  • Aleksandrs Slivkins
  • Chicheng Zhang

We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming” behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

NeurIPS Conference 2020 Conference Paper

Efficient Contextual Bandits with Continuous Actions

  • Maryam Majzoubi
  • Chicheng Zhang
  • Rajan Chari
  • Akshay Krishnamurthy
  • John Langford
  • Aleksandrs Slivkins

We create a computationally tractable learning algorithm for contextual bandits with continuous actions having unknown structure. The new reduction-style algorithm composes with most supervised learning representations. We prove that this algorithm works in a general sense and verify the new functionality with large-scale experiments.

NeurIPS Conference 2020 Conference Paper

Empirical Likelihood for Contextual Bandits

  • Nikos Karampatziakis
  • John Langford
  • Paul Mineiro

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence interval as simple convex optimization problems. Using the lower bound of our confidence interval, we then propose an off-policy policy optimization algorithm that searches for policies with large reward lower bound. We empirically find that both our estimator and confidence interval improve over previous proposals in finite sample regimes. Finally, the policy optimization algorithm we propose outperforms a strong baseline system for learning from off-policy data.

NeurIPS Conference 2020 Conference Paper

Learning the Linear Quadratic Regulator from Nonlinear Observations

  • Zakaria Mhammedi
  • Dylan J. Foster
  • Max Simchowitz
  • Dipendra Misra
  • Wen Sun
  • Akshay Krishnamurthy
  • Alexander Rakhlin
  • John Langford

We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learner has access to a class of decoder functions (e. g. , neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracle-efficient and accesses the decoder class only through calls to a least-squares regression oracle. To our knowledge, our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model.

JMLR Journal 2019 Journal Article

Active Learning for Cost-Sensitive Classification

  • Akshay Krishnamurthy
  • Alekh Agarwal
  • Tzu-Kuo Huang
  • Hal Daumé III
  • John Langford

We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs. Our algorithm, COAL, makes predictions by regressing to each label's cost and predicting the smallest. On a new example, it uses a set of regressors that perform well on past data to estimate possible costs for each label. It queries only the labels that could be the best, ignoring the sure losers. We prove COAL can be efficiently implemented for any regression family that admits squared loss optimization; it also enjoys strong guarantees with respect to predictive performance and labeling effort. We empirically compare COAL to passive learning and several active learning baselines, showing significant improvements in labeling effort and test cost on real-world datasets. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

NeurIPS Conference 2019 Conference Paper

Efficient Forward Architecture Search

  • Hanzhang Hu
  • John Langford
  • Rich Caruana
  • Saurajit Mukherjee
  • Eric Horvitz
  • Debadeepta Dey

We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers. The added shortcut connections effectively perform gradient boosting on the augmented layers. The proposed algorithm is motivated by the feature selection algorithm forward stage-wise linear regression, since we consider NAS as a generalization of feature selection for regression, where NAS selects shortcuts among layers instead of selecting features. In order to reduce the number of trials of possible connection combinations, we train jointly all possible connections at each stage of growth while leveraging feature selection techniques to choose a subset of them. We experimentally show this process to be an efficient forward architecture search algorithm that can find competitive models using few GPU days in both the search space of repeatable network modules (cell-search) and the space of general networks (macro-search). Petridish is particularly well-suited for warm-starting from existing models crucial for lifelong-learning scenarios.

NeurIPS Conference 2018 Conference Paper

On Oracle-Efficient PAC RL with Rich Observations

  • Christoph Dann
  • Nan Jiang
  • Akshay Krishnamurthy
  • Alekh Agarwal
  • John Langford
  • Robert Schapire

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE, cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

RLDM Conference 2017 Conference Abstract

Contextual Decision Processes with low Bellman rank are PAC-Learnable

  • Nan Jiang
  • Akshay Krishnamurthy
  • Alekh Agarwal
  • John Langford
  • Robert Schapire

This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify and gener- alize most prior RL settings. Our first contribution is a complexity measure, the Bellman Rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied RL settings. Our second contribution is a new RL algorithm that engages in systematic explo- ration to learn near-optimal behavior in CDPs with low Bellman Rank. The algorithm requires a number of samples that is polynomial in all relevant parameters but independent of the number of unique contexts. Our approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for RL with function approximation.

NeurIPS Conference 2017 Conference Paper

Off-policy evaluation for slate recommendation

  • Adith Swaminathan
  • Akshay Krishnamurthy
  • Alekh Agarwal
  • Miro Dudik
  • John Langford
  • Damien Jose
  • Imed Zitouni

This paper studies the evaluation of policies that recommend an ordered set of items (e. g. , a ranking) based on some context---a common scenario in web search, ads, and recommendation. We build on techniques from combinatorial bandits to introduce a new practical estimator that uses logged data to estimate a policy's performance. A thorough empirical evaluation on real-world data reveals that our estimator is accurate in a variety of settings, including as a subroutine in a learning-to-rank task, where it achieves competitive performance. We derive conditions under which our estimator is unbiased---these conditions are weaker than prior heuristics for slate evaluation---and experimentally demonstrate a smaller bias than parametric approaches, even when these conditions are violated. Finally, our theory and experiments also show exponential savings in the amount of required data compared with general unbiased estimators.

NeurIPS Conference 2016 Conference Paper

A Credit Assignment Compiler for Joint Prediction

  • Kai-Wei Chang
  • He He
  • Stephane Ross
  • Hal Daume III
  • John Langford

Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search space can be defined by an arbitrary imperative program, turning learning to search into a credit assignment compiler. Altogether with the algorithmic improvements for the compiler, we radically reduce the complexity of programming and the running time. We demonstrate the feasibility of our approach on multiple joint prediction tasks. In all cases, we obtain accuracies as high as alternative approaches, at drastically reduced execution and programming time.

NeurIPS Conference 2016 Conference Paper

Efficient Second Order Online Learning by Sketching

  • Haipeng Luo
  • Alekh Agarwal
  • Nicolò Cesa-Bianchi
  • John Langford

We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data. SON is an enhanced version of the Online Newton Step, which, via sketching techniques enjoys a running time linear in the dimension and sketch size. We further develop sparse forms of the sketching methods (such as Oja's rule), making the computation linear in the sparsity of features. Together, the algorithm eliminates all computational obstacles in previous second order online learning approaches.

NeurIPS Conference 2016 Conference Paper

PAC Reinforcement Learning with Rich Observations

  • Akshay Krishnamurthy
  • Alekh Agarwal
  • John Langford

We propose and study a new model for reinforcement learning with rich observations, generalizing contextual bandits to sequential decision making. These models require an agent to take actions based on observations (features) with the goal of achieving long-term performance competitive with a large set of policies. To avoid barriers to sample-efficient learning associated with large observation spaces and general POMDPs, we focus on problems that can be summarized by a small number of hidden states and have long-term rewards that are predictable by a reactive function class. In this setting, we design and analyze a new reinforcement learning algorithm, Least Squares Value Elimination by Exploration. We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justification for reinforcement learning with function approximation.

NeurIPS Conference 2016 Conference Paper

Search Improves Label for Active Learning

  • Alina Beygelzimer
  • Daniel Hsu
  • John Langford
  • Chicheng Zhang

We investigate active learning with access to two distinct oracles: LABEL (which is standard) and SEARCH (which is not). The SEARCH oracle models the situation where a human searches a database to seed or counterexample an existing solution. SEARCH is stronger than LABEL while being natural to implement in many situations. We show that an algorithm using both oracles can provide exponentially large problem-dependent improvements over LABEL alone.

NeurIPS Conference 2015 Conference Paper

Efficient and Parsimonious Agnostic Active Learning

  • Tzu-Kuo Huang
  • Alekh Agarwal
  • Daniel Hsu
  • John Langford
  • Robert Schapire

We develop a new active learning algorithm for the streaming settingsatisfying three important properties: 1) It provably works for anyclassifier representation and classification problem including thosewith severe noise. 2) It is efficiently implementable with an ERMoracle. 3) It is more aggressive than all previous approachessatisfying 1 and 2. To do this, we create an algorithm based on a newlydefined optimization problem and analyze it. We also conduct the firstexperimental analysis of all efficient agnostic active learningalgorithms, evaluating their strengths and weaknesses in differentsettings.

NeurIPS Conference 2015 Conference Paper

Logarithmic Time Online Multiclass prediction

  • Anna Choromanska
  • John Langford

We study the problem of multiclass classification with an extremely large number of classes (k), with the goal of obtaining train and test time complexity logarithmic in the number of classes. We develop top-down tree construction approaches for constructing logarithmic depth trees. On the theoretical front, we formulate a new objective function, which is optimized at each node of the tree and creates dynamic partitions of the data which are both pure (in terms of class labels) and balanced. We demonstrate that under favorable conditions, we can construct logarithmic depth trees that have leaves with low label entropy. However, the objective function at the nodes is challenging to optimize computationally. We address the empirical problem with a new online decision tree construction procedure. Experiments demonstrate that this online algorithm quickly achieves improvement in test error compared to more common logarithmic training time approaches, which makes it a plausible method in computationally constrained large-k applications.

JMLR Journal 2014 Journal Article

A Reliable Effective Terascale Linear Learning System

  • Alekh Agarwal
  • Oliveier Chapelle
  • Miroslav Dudík
  • John Langford

We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features, (The number of features here refers to the number of non-zero entries in the data matrix.) billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the careful synthesis required to obtain an efficient implementation is. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature. (All the empirical evaluation reported in this work was carried out between May-Oct 2011.) We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices. [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

NeurIPS Conference 2014 Conference Paper

Scalable Non-linear Learning with Adaptive Polynomial Expansions

  • Alekh Agarwal
  • Alina Beygelzimer
  • Daniel Hsu
  • John Langford
  • Matus Telgarsky

Can we effectively learn a nonlinear representation in time comparable to linear learning? We describe a new algorithm that explicitly and adaptively expands higher-order interaction features over base linear representations. The algorithm is designed for extreme computational efficiency, and an extensive experimental study shows that its computation/prediction tradeoff ability compares very favorably against strong baselines.

NeurIPS Conference 2010 Conference Paper

Agnostic Active Learning Without Constraints

  • Alina Beygelzimer
  • Daniel Hsu
  • John Langford
  • Tong Zhang

We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification.

NeurIPS Conference 2010 Conference Paper

Learning from Logged Implicit Exploration Data

  • Alex Strehl
  • John Langford
  • Lihong Li
  • Sham Kakade

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in contextual bandit'' or partially labeled'' settings where only the value of a chosen action is learned. The primary challenge in a variety of settings is that the exploration policy, in which ``offline'' data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged. We empirically verify our solution on two reasonably sized sets of real-world data obtained from an Internet %online advertising company.

JMLR Journal 2009 Journal Article

Hash Kernels for Structured Data

  • Qinfeng Shi
  • James Petterson
  • Gideon Dror
  • John Langford
  • Alex Smola
  • S.V.N. Vishwanathan

We propose hashing to facilitate efficient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation on strings and graphs. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

NeurIPS Conference 2009 Conference Paper

Multi-Label Prediction via Compressed Sensing

  • Daniel Hsu
  • Sham Kakade
  • John Langford
  • Tong Zhang

We consider multi-label prediction problems with large output spaces under the assumption of output sparsity – that the target (label) vectors have small support. We develop a general theory for a variant of the popular error correcting output code scheme, using ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a simple reduction from multi-label regression problems to binary regression problems. We show that the number of subprob- lems need only be logarithmic in the total number of possible labels, making this approach radically more efficient than others. We also state and prove robustness guarantees for this method in the form of regret transform bounds (in general), and also provide a more detailed analysis for the linear prediction setting.

NeurIPS Conference 2009 Conference Paper

Slow Learners are Fast

  • Martin Zinkevich
  • John Langford
  • Alex Smola

Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning.

JMLR Journal 2009 Journal Article

Sparse Online Learning via Truncated Gradient

  • John Langford
  • Lihong Li
  • Tong Zhang

We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss functions. This method has several essential properties: The degree of sparsity is continuous---a parameter controls the rate of sparsification from no sparsification to total sparsification. The approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L 1 -regularization method in the batch setting. We prove that small rates of sparsification result in only small additional regret with respect to typical online-learning guarantees. The approach works well empirically. We apply the approach to several data sets and find for data sets with large numbers of features, substantial sparsity is discoverable. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

NeurIPS Conference 2008 Conference Paper

Predictive Indexing for Fast Search

  • Sharad Goel
  • John Langford
  • Alexander Strehl

We tackle the computational problem of query-conditioned search. Given a machine-learned scoring rule and a query distribution, we build a predictive index by precomputing lists of potential results sorted based on an expected score of the result over future queries. The predictive index datastructure supports an anytime algorithm for approximate retrieval of the top elements. The general approach is applicable to webpage ranking, internet advertisement, and approximate nearest neighbor search. It is particularly effective in settings where standard techniques (e. g. , inverted indices) are intractable. We experimentally find substantial improvement over existing methods for internet advertisement and approximate nearest neighbors.

NeurIPS Conference 2008 Conference Paper

Sparse Online Learning via Truncated Gradient

  • John Langford
  • Lihong Li
  • Tong Zhang

We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous---a parameter controls the rate of sparsification from no sparsification to total sparsification. Second, the approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular $L_1$-regularization method in the batch setting. We prove that small rates of sparsification result in only small additional regret with respect to typical online-learning guarantees. Finally, the approach works well empirically. We apply it to several datasets and find that for datasets with large numbers of features, substantial sparsity is discoverable.

NeurIPS Conference 2007 Conference Paper

The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

  • John Langford
  • Tong Zhang

We present Epoch-Greedy, an algorithm for multi-armed bandits with observable side information. Epoch-Greedy has the following properties: No knowledge of a time horizon $T$ is necessary. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. The regret scales as $O(T^{2/3} S^{1/3})$ or better (sometimes, much better). Here $S$ is the complexity term in a sample complexity bound for standard supervised learning.

JMLR Journal 2005 Journal Article

Tutorial on Practical Prediction Theory for Classification

  • John Langford

We discuss basic prediction theory and its impact on classification success evaluation, implications for learning algorithm design, and uses in learning algorithm execution. This tutorial is meant to be a comprehensive compilation of results which are both theoretically rigorous and quantitatively useful. There are two important implications of the results presented here. The first is that common practices for reporting results in classification should change to use the test set bound. The second is that train set bounds can sometimes be used to directly motivate learning algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2005. ( edit, beta )

JMLR Journal 2004 Journal Article

Computable Shell Decomposition Bounds

  • John Langford
  • David McAllester

Haussler, Kearns, Seung and Tishby introduced the notion of a shell decomposition of the union bound as a means of understanding certain empirical phenomena in learning curves such as phase transitions. Here we use a variant of their ideas to derive an upper bound on the generalization error of a hypothesis computable from its training error and the histogram of training errors for the hypotheses in the class. In most cases this new bound is significantly tighter than traditional bounds computed from the training error and the cardinality of the class. Our results can also be viewed as providing a rigorous foundation for a model selection algorithm proposed by Scheffer and Joachims. [abs] [ pdf ][ ps.gz ][ ps ]

NeurIPS Conference 2001 Conference Paper

(Not) Bounding the True Error

  • John Langford
  • Rich Caruana

We present a new approach to bounding the true error rate of a continuous valued classifier based upon PAC-Bayes bounds. The method first con- structs a distribution over classifiers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classifier found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound. In this paper we demonstrate the method on artificial neural networks with results of a  order of magnitude im- provement vs. the best deterministic neural net bounds.

NeurIPS Conference 2001 Conference Paper

Risk Sensitive Particle Filters

  • Sebastian Thrun
  • John Langford
  • Vandi Verma

We propose a new particle filter that incorporates a model of costs when generating particles. The approach is motivated by the observation that the costs of accidentally not tracking hypotheses might be significant in some areas of state space, and next to irrelevant in others. By incorporat- ing a cost model into particle filtering, states that are more critical to the system performance are more likely to be tracked. Automatic calculation of the cost model is implemented using an MDP value function calcula- tion that estimates the value of tracking a particular state. Experiments in two mobile robot domains illustrate the appropriateness of the approach.