Arrow Research search

Author name cluster

Charles Isbell

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

39 papers
2 author rows

Possible papers

39

ICML Conference 2020 Conference Paper

Estimating Q(s, s') with Deep Deterministic Dynamics Gradients

  • Ashley D. Edwards
  • Himanshu Sahni
  • Rosanne Liu
  • Jane Hung
  • Ankit Jain
  • Rui Wang 0052
  • Adrien Ecoffet
  • Thomas Miconi

In this paper, we introduce a novel form of value function, $Q(s, s’)$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s’$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at http: //sites. google. com/view/qss-paper.

IROS Conference 2020 Conference Paper

Supportive Actions for Manipulation in Human-Robot Coworker Teams

  • Shray Bansal
  • Rhys Newbury
  • Wesley P. Chan
  • Akansel Cosgun
  • Aimee Allen
  • Dana Kulic
  • Tom Drummond
  • Charles Isbell

The increasing presence of robots alongside humans, such as in human-robot teams in manufacturing, gives rise to research questions about the kind of behaviors people prefer in their robot counterparts. We term actions that support interaction by reducing future interference with others as supportive robot actions and investigate their utility in a co-located manipulation scenario. We compare two robot modes in a shared table pick-and-place task: (1) Task-oriented: the robot only takes actions to further its task objective and (2) Supportive: the robot sometimes prefers supportive actions to task-oriented ones when they reduce future goal-conflicts. Our experiments in simulation, using a simplified human model, reveal that supportive actions reduce the interference between agents, especially in more difficult tasks, but also cause the robot to take longer to complete the task. We implemented these modes on a physical robot in a user study where a human and a robot perform object placement on a shared table. Our results show that a supportive robot was perceived more favorably as a coworker and also reduced interference with the human in one of two scenarios. However, it also took longer to complete the task highlighting an interesting trade-off between task-efficiency and human-preference that needs to be considered before designing robot behavior for close-proximity manipulation scenarios.

AAAI Conference 2019 Conference Paper

Composable Modular Reinforcement Learning

  • Christopher Simpkins
  • Charles Isbell

Modular reinforcement learning (MRL) decomposes a monolithic multiple-goal problem into modules that solve a portion of the original problem. The modules’ action preferences are arbitrated to determine the action taken by the agent. Truly modular reinforcement learning would support not only decomposition into modules, but composability of separately written modules in new modular reinforcement learning agents. However, the performance of MRL agents that arbitrate module preferences using additive reward schemes degrades when the modules have incomparable reward scales. This performance degradation means that separately written modules cannot be composed in new modular reinforcement learning agents as-is – they may need to be modified to align their reward scales. We solve this problem with a Q-learningbased command arbitration algorithm and demonstrate that it does not exhibit the same performance degradation as existing approaches to MRL, thereby supporting composability.

ICML Conference 2019 Conference Paper

Imitating Latent Policies from Observation

  • Ashley D. Edwards
  • Himanshu Sahni
  • Yannick Schroecker
  • Charles Isbell

In this paper, we describe a novel approach to imitation learning that infers latent policies directly from state observations. We introduce a method that characterizes the causal effects of latent actions on observations while simultaneously predicting their likelihood. We then outline an action alignment procedure that leverages a small amount of environment interactions to determine a mapping between the latent and real-world actions. We show that this corrected labeling can be used for imitating the observed behavior, even though no expert actions are given. We evaluate our approach within classic control environments and a platform game and demonstrate that it performs better than standard approaches. Code for this work is available at https: //github. com/ashedwards/ILPO.

RLDM Conference 2017 Conference Abstract

SAIL: A Temporal Difference Approach to State Aware Imitation Learning

  • Yannick Schroecker
  • Charles Isbell

Imitation learning aims at training agents to reproduce a teachers policy based on a set of demon- strated states and actions. However, attempting to reproduce actions without learning about the environment can lead the agent to situations that are unlike the ones encountered as part of the provided demonstrations, making it more likely for the agent to make a mistake. In this work we present State Aware Imitation Learning (SAIL), an algorithm for imitation learning which augments the supervised approach of imitation learning by explicitly trying to reproduce the demonstrated states as well. The algorithm achieves this goal by maximizing the joint likelihood over states and actions at each time step. Based on existing work by Morimura et al. [6], we show that an update rule similar to online temporal difference learning can be used to learn the gradient of said joint distribution which allows us to perform gradient ascent. The resulting policy allows the agent to remain close to states in which it knows what to do which prevents errors from accumulating over time. Naturally, learning this gradient requires additional information about the world which take the form of sample roll-outs in an unsupervised manner, but it does not require further input from the teacher. While the algorithm proposed in this paper can be used with any kind of function approximator, we evaluate our approach on a simple race track domain with 7425 discrete states. Using a tabular repre- sentation combined with randomness makes it impossible to train a policy in a purely supervised way such that it behaves near optimally in states that have not been encountered as part of a demonstration. We show that using unsupervised sample transitions with our approach allows the agent to learn a reasonable policy outside of the set of observed states and show that SAIL outperforms a purely supervised learning approach on this task.

NeurIPS Conference 2017 Conference Paper

State Aware Imitation Learning

  • Yannick Schroecker
  • Charles Isbell

Imitation learning is the study of learning how to act given a set of demonstrations provided by a human expert. It is intuitively apparent that learning to take optimal actions is a simpler undertaking in situations that are similar to the ones shown by the teacher. However, imitation learning approaches do not tend to use this insight directly. In this paper, we introduce State Aware Imitation Learning (SAIL), an imitation learning algorithm that allows an agent to learn how to remain in states where it can confidently take the correct action and how to recover if it is lead astray. Key to this algorithm is a gradient learned using a temporal difference update rule which leads the agent to prefer states similar to the demonstrated states. We show that estimating a linear approximation of this gradient yields similar theoretical guarantees to online temporal difference learning approaches and empirically show that SAIL can effectively be used for imitation learning in continuous domains with non-linear function approximators used for both the policy representation and the gradient estimate.

RLDM Conference 2017 Conference Abstract

State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learn- ing

  • Saurabh Kumar
  • Himanshu Sahni
  • Farhan Tejani
  • Yannick Schroecker
  • Charles Isbell

Typical reinforcement learning (RL) agents learn to complete tasks specified by reward functions tailored to their domain. As such, the policies they learn do not generalize even to similar domains. To address this issue, we develop a framework through which a deep RL agent learns to generalize policies from smaller, simpler domains to more complex ones using a recurrent attention mechanism. The task is presented to the agent as an image and an instruction specifying the goal. This meta-controller guides the agent towards its goal by designing a sequence of smaller sub-tasks on the part of the state space within the attention, effectively decomposing it. As a baseline, we consider a setup without attention as well. Our experiments show that the meta-controller learns to create subgoals within the attention.

IROS Conference 2016 Conference Paper

Navigation Among Movable Obstacles with learned dynamic constraints

  • Jonathan Scholz
  • Nehchal Jindal
  • Martin Levihn
  • Charles Isbell
  • Henrik I. Christensen

In this paper we present the first planner for the problem of Navigation Among Movable Obstacles (NAMO) on a real robot that can handle environments with under-specified object dynamics. This result makes use of recent progress from two threads of the Reinforcement Learning literature. The first is a hierarchical Markov-Decision Process formulation of the NAMO problem designed to handle dynamics uncertainty. The second is a physics-based Reinforcement Learning framework which offers a way to ground this uncertainty in a compact model space that can be efficiently updated from data received by the robot online. Our results demonstrate the ability of a robot to adapt to unexpected object behavior in a real office scenario.

AAMAS Conference 2016 Conference Paper

Object-Focused Advice in Reinforcement Learning (Extended Abstract)

  • Samantha Krening
  • Brent Harrison
  • Karen M. Feigh
  • Charles Isbell
  • Andrea Thomaz

In order for robots and intelligent agents to interact with and learn from people with no machine-learning expertise, robots should be able to learn from natural human instruction. Many human explanations consist of simple sentences without state information, yet most machine learning techniques that incorporate human guidance cannot use nonspecific explanations. This work aims to learn policies from a few sentences that aren’t state specific. The proposed Object-focused advice links an object to an action, and allows a person to generalize over an object’s state space. To evaluate this technique, agents were trained using Objectfocused advice collected from participants in an experiment in the Mario Bros. domain. The results show that Objectfocused advice performs better than when no advice is given, the agent can learn where to apply the advice in the state space, and the agent can recover from adversarial advice. Also, including warnings of what not do to in addition to advice of what actions to take improves performance. CCS Concepts •Human-centered computing → Text input; •Computing methodologies → Reinforcement learning;

AAMAS Conference 2016 Conference Paper

Policy Shaping in Domains with Multiple Optimal Policies (Extended Abstract)

  • Himanshu Sahni
  • Brent Harrison
  • Kaushik Subramanian
  • Thomas Cederborg
  • Charles Isbell
  • Andrea Thomaz

In many domains, there exist multiple ways for an agent to achieve optimal performance. Feedback may be provided along one or more of them to aid learning. In this work, we investigate whether humans have a preference towards providing feedback along one optimal policy over the other in two gridworld domains. We find that for the domain with significant risk to exploration, 60% of our participants prefer to discourage the agent’s exploration along the risky portion of the state space, while 40% state that they have no preference. We also use the interactive reinforcement learning algorithm Policy Shaping to evaluate the performance of simulated oracles with a number of feedback strategies. We find that certain domain traits, such as risk during exploration and number of optimal policies play an important role in determining the best performing feedback strategy.

RLDM Conference 2015 Conference Abstract

Expressing Tasks Robustly via Multiple Discount Factors

  • Ashley Edwards
  • Michael Littman
  • Charles Isbell

Reward engineering is the problem of expressing a target task for an agent in the form of rewards for a Markov decision process. To be useful for learning, it is important that these encodings be robust to structural changes in the underlying domain; that is, the specification remain unchanged for any domain in some target class. We identify problems that are difficult to express robustly via the standard model of discounted rewards. In response, we examine the idea of decomposing a reward function into separate components, each with its own discount factor. We describe a method for finding robust parameters through the concept of task engineering, which additionally modifies the discount factors. We present a method for optimizing behavior in this setting and show that it could provide a more robust language than standard approaches. Poster T22*: Multi-Objective Markov Decision Processes for Decision Support Dan Lizotte*, University of Western Ontario; Eric Laber, North Carolina State University We present a new data analysis framework, Multi-Objective Markov Decision Processes for Decision Support, for developing sequential decision support systems. The framework extends the Multi- Objective Markov Decision Process with the ability to provide support that is tailored to different decision- makers with different preferences about which objectives are most important to them. We present an exten- sion of fitted-Q iteration for multiple objectives that can compute recommended actions in this context; in doing so we identify and address several conceptual and computational challenges. Finally, we demonstrate how our model could be applied to provide decision support for choosing treatments for schizophrenia using data from the Clinical Antipsychotic Trials of Intervention Effectiveness. Poster T23*: Reinforcement learning based on impulsively biased time scale and its neural substrate in OCD Yuki Sakai*, KPUM; Saori Tanaka, ATR; Yoshinari Abe, KPUM; Seiji Nishida, KPUM; Takashi Nakamae, KPUM; Kei Yamada, KPUM; Kenji Doya, OIST; Kenji Fukui, KPUM; Jin Narumoto, KPUM Obsessive-compulsive disorder (OCD) is a common neuropsychiatric disorder with a lifetime prevalence of 2-3%, which is characterized by persistent intrusive thoughts (obsessions), repetitive actions (compulsions). Howard Hughes, as depicted in the famous movie ‘Aviator, ’ suffered from severe OCD in his last years. He could not stop washing his hands and died alone in a hotel room because of his anxiety of bacterial contamination. Like his case, OCD seriously impairs patients’ daily lives Patients with OCD impulsively act on compulsive behavior to reduce obsession-related anxiety despite the profound effects on their life. Serotonergic dysfunction and hyper activity in ventral-striatal circuitry are thought to be essential in neuropathophysiology of OCD. Since cumulative evidence in human and animals suggests that serotonergic dysfunction and related alteration in ventral-striatal activity underlies impulsive behavior, which is caused by ‘prospective’ manner (underestimation of future reward) and ‘retrospective’ manner (impaired association of aversive outcomes to past actions), we hypothesized that OCD is the disorder of ‘impulsively biased time scale’. Here, we conducted the behavioral and fMRI experiments to investigate the mechanism of impulsive action selection in OCD. In fMRI experiment during prospective decision making (experiment (i)), patients with OCD had significantly greater correlated activities with impulsive short-term reward prediction in the ventral striatum, which were similar to our previous findings of healthy subjects at low serotonin levels. In experiment (ii), we conducted the monetary choice task that is difficult to solve in a prospective way and observed significantly slower associative learning when actions were followed by a delayed punishment in OCD. These results suggest that impulsive action selection characterized by both prospective and retrospective manner underlies disadvantageous compulsive behavior in OCD. Poster T24*: Direct Predictive Collaborative Control of a Prosthetic Arm Craig Sherstan, University of Alberta; Joseph Modayil, University of Alberta; Patrick Pilarski*, University of Alberta We have developed an online learning system for the collaborative control of an assistive device. Collaborative control is a complex setting requiring a human user and a learning system (automation) to co-operate towards achieving the user’s goals. There are many control domains where the number of con- trollable functions available to a user surpass what a user can attend to at a given moment. Such domains may benefit from having automation assist the user by controlling those unattended functions. How exactly this interaction between user decision making and automated decision making should occur is not clear, nor is it clear to what degree automation is beneficial or desired. We should expect such answers to vary from domain to domain and possibly from moment to moment. One domain of interest is the control of powered prosthetic arms by amputees. Upper-limb amputees are extremely limited in the number of inputs they can provide to a prosthetic device and typically control only one joint at a time with the ability to toggle between joints. Control of modern prostheses is often considered by users to be laborious and non-intuitive. To address these difficulties, we have developed a collaborative control framework called Direct Predictive Collaborative Control (DPCC), which uses a reinforcement learning technique known as general value func- tions to make temporal predictions about user behavior. These predictions are directly mapped to the control of unattended actuators to produce movement synergies. We evaluate DPCC during the human control of a powered multi-joint arm. We show that DPCC improves a user’s ability to perform coordinated movement tasks. Additionally, we demonstrate that this method can be used without the need for a specific training environment, learning only from user’s behavior. To our knowledge this is also the first demonstration of the combined use of the new True Online TD(lambda) algorithm with general value functions for online control.

ICRA Conference 2015 Conference Paper

Learning non-holonomic object models for mobile manipulation

  • Jonathan Scholz
  • Martin Levihn
  • Charles Isbell
  • Henrik I. Christensen
  • Mike Stilman

For a mobile manipulator to interact with large everyday objects, such as office tables, it is often important to have dynamic models of these objects. However, as it is infeasible to provide the robot with models for every possible object it may encounter, it is desirable that the robot can identify common object models autonomously. Existing methods for addressing this challenge are limited by being either purely kinematic, or inefficient due to a lack of physical structure. In this paper, we present a physics-based method for estimating the dynamics of common non-holonomic objects using a mobile manipulator, and demonstrate its efficiency compared to existing approaches.

RLDM Conference 2015 Conference Abstract

Reinforcement Learning as Software Engineering

  • Charles Isbell

A central tenant of reinforcement learning (RL) is that behavior is driven by a desire to maximize rewarding stimuli. In the computing context, RL can be seen as a software engineering methodology for specifying the behavior of agents in complex, uncertain environments. In this analogy, Markov Decision Processes– especially an MDP’s rewards–are programs while learning algorithms are compilers. In general, the field has focused almost exclusively on the compilers—the design of algorithms for finding reward-maximizing behavior—but not much attention has been paid to the role of the programming language and the software engineering support for helping developers build good programs. In this talk, I will describe our efforts to probe the nature of MDPs-as-programs with the goals of moving toward higher-level specifications that satisfy the software engineering goals of clear semantics, expressibility, and ease of use while still admitting the efficient compilers that the RL community has traditionally enjoyed. Poster Session 1, Monday, June 8, 2015 Starred posters will also give a plenary talk.

ICML Conference 2014 Conference Paper

A Physics-Based Model Prior for Object-Oriented MDPs

  • Jonathan Scholz
  • Martin Levihn
  • Charles Isbell
  • David Wingate

One of the key challenges in using reinforcement learning in robotics is the need for models that capture natural world structure. There are, methods that formalize multi-object dynamics using relational representations, but these methods are not sufficiently compact for real-world robotics. We present a physics-based approach that exploits modern simulation tools to efficiently parameterize physical dynamics. Our results show that this representation can result in much faster learning, by virtue of its strong but appropriate inductive bias in physical environments.

NeurIPS Conference 2013 Conference Paper

Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

  • Liam MacDermed
  • Charles Isbell

This paper presents four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that outperforms all existing algorithms on all but one standard infinite-horizon benchmark problems. (1) We give an integer program that solves collaborative Bayesian games (CBGs). The program is notable because its linear relaxation is very often integral. (2) We show that a DecPOMDP with bounded belief can be converted to a POMDP (albeit with actions exponential in the number of beliefs). These actions correspond to strategies of a CBG. (3) We present a method to transform any DecPOMDP into a DecPOMDP with bounded beliefs (the number of beliefs is a free parameter) using optimal (not lossless) belief compression. (4) We show that the combination of these results opens the door for new classes of DecPOMDP algorithms based on previous POMDP algorithms. We choose one such algorithm, point-based valued iteration, and modify it to produce the first tractable value iteration method for DecPOMDPs which outperforms existing algorithms.

NeurIPS Conference 2013 Conference Paper

Policy Shaping: Integrating Human Feedback with Reinforcement Learning

  • Shane Griffith
  • Kaushik Subramanian
  • Jonathan Scholz
  • Charles Isbell
  • Andrea Thomaz

A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. State-of-the-art methods have approached this problem by mapping human information to reward and value signals to indicate preferences and then iterating over them to compute the necessary control policy. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct labels on the policy. We compare Advise to state-of-the-art approaches and highlight scenarios where it outperforms them and importantly is robust to infrequent and inconsistent human feedback.

RLDM Conference 2013 Conference Abstract

Policy shaping: Integrating human feedback with reinforcement learning

  • Shane Griffith
  • Kaushik Subramanian
  • Jonathan Scholz
  • Charles Isbell
  • Andrea Thomaz

A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human in- formation to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate and more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches using a series of exper- iments. These experiments use two classic arcade games, together with feedback from a simulated human teacher, which allows us to systematically test performance under a variety of cases of infrequent and in- consistent feedback. We show that Advise has similar performance to the state of the art, but is more robust to a noisy signal from the human and fairs well with an inaccurate estimate of its single input parameter. With these advancements this paper may help to make learning from human feedback an increasingly viable option for intelligent systems. Sunday, October 27, 2013

ICML Conference 2013 Conference Paper

Tree-Independent Dual-Tree Algorithms

  • Ryan R. Curtin
  • William B. March
  • Parikshit Ram
  • David V. Anderson
  • Alexander G. Gray
  • Charles Isbell

Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, developing dual-tree algorithms for use with different trees and problems is often complex and burdensome. We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule. We provide a meta-algorithm which allows development of dual-tree algorithms in a tree-independent manner and easy extension to entirely new types of trees. Representations are provided for five common algorithms; for k-nearest neighbor search, this leads to a novel, tighter pruning bound. The meta-algorithm also allows straightforward extensions to massively parallel settings.

AAAI Conference 2012 Conference Paper

Computing Optimal Strategies to Commit to in Stochastic Games

  • Joshua Letchford
  • Liam MacDermed
  • Vincent Conitzer
  • Ronald Parr
  • Charles Isbell

Significant progress has been made recently in the following two lines of research in the intersection of AI and game theory: (1) the computation of optimal strategies to commit to (Stackelberg strategies), and (2) the computation of correlated equilibria of stochastic games. In this paper, we unite these two lines of research by studying the computation of Stackelberg strategies in stochastic games. We provide theoretical results on the value of being able to commit and the value of being able to correlate, as well as complexity results about computing Stackelberg strategies in stochastic games. We then modify the QPACE algorithm (MacDermed et al. 2011) to compute Stackelberg strategies, and provide experimental results.

AAAI Conference 2011 Conference Paper

Quick Polytope Approximation of All Correlated Equilibria in Stochastic Games

  • Liam MacDermed
  • Karthik Narayan
  • Charles Isbell
  • Lora Weiss

Stochastic or Markov games serve as reasonable models for a variety of domains from biology to computer security, and are appealing due to their versatility. In this paper we address the problem of finding the complete set of correlated equilibria for general-sum stochastic games with perfect information. We present QPACE – an algorithm orders of magnitude more efficient than previous approaches while maintaining a guarantee of convergence and bounded error. Finally, we validate our claims and demonstrate the limits of our algorithm with extensive empirical tests.

AAMAS Conference 2010 Conference Paper

Using Training Regimens to Teach Expanding Function Approximators

  • Peng Zang
  • Arya Irani
  • Peng Zhou
  • Andrea Thomaz
  • Charles Isbell

In complex real-world environments, traditional (tabular)techniques for solving Reinforcement Learning (RL) do notscale. Function approximation is needed, but unfortunately, existing approaches generally have poor convergence and optimality guarantees. Additionally, for the case of humanenvironments, it is valuable to be able to leverage humaninput. In this paper we introduce Expanding Value Function Approximation (EVFA), a function approximation algorithm that returns the optimal value function given sufficient rounds. To leverage human input, we introduce a newhuman-agent interaction scheme, training regimens, whichallow humans to interact with and improve agent learning inthe setting of a machine learning game. In experiments, weshow EVFA compares favorably to standard value approximation approaches. We also show that training regimensenable humans to further improve EVFA performance. Inour user study, we find that non-experts are able to provideeffective regimens and that they found the game fun.

AIJ Journal 2009 Journal Article

A novel sequence representation for unsupervised analysis of human activities

  • Raffay Hamid
  • Siddhartha Maddi
  • Amos Johnson
  • Aaron Bobick
  • Irfan Essa
  • Charles Isbell

Formalizing computational models for everyday human activities remains an open challenge. Many previous approaches towards this end assume prior knowledge about the structure of activities, using which explicitly defined models are learned in a completely supervised manner. For a majority of everyday environments however, the structure of the in situ activities is generally not known a priori. In this paper we investigate knowledge representations and manipulation techniques that facilitate learning of human activities in a minimally supervised manner. The key contribution of this work is the idea that global structural information of human activities can be encoded using a subset of their local event subsequences, and that this encoding is sufficient for activity-class discovery and classification. In particular, we investigate modeling activity sequences in terms of their constituent subsequences that we call event n-grams. Exploiting this representation, we propose a computational framework to automatically discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding characterizations of these discovered classes from a holistic as well as a by-parts perspective. Using such characterizations, we present a method to classify a new activity to one of the discovered activity-classes, and to automatically detect whether it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our approach in a variety of everyday environments.

NeurIPS Conference 2009 Conference Paper

Solving Stochastic Games

  • Liam Dermed
  • Charles Isbell

Solving multi-agent reinforcement learning problems has proven difficult because of the lack of tractable algorithms. We provide the first approximation algorithm which solves stochastic games to within $\epsilon$ relative error of the optimal game-theoretic solution, in time polynomial in $1/\epsilon$. Our algorithm extends Murrays and Gordon’s (2007) modified Bellman equation which determines the \emph{set} of all possible achievable utilities; this provides us a truly general framework for multi-agent learning. Further, we empirically validate our algorithm and find the computational cost to be orders of magnitude less than what the theory predicts.

IJCAI Conference 2007 Conference Paper

  • David Minnen
  • Thad Starner
  • Irfan Essa
  • Charles Isbell

A fundamental problem for artificial intelligence is identifying perceptual primitives from raw sensory signals that are useful for higher-level reasoning. We equate these primitives with initially unknown recurring patterns called motifs. Autonomously learning the motifs is difficult because their number, location, length, and shape are all unknown. Furthermore, nonlinear temporal warping may be required to ensure the similarity of motif occurrences. In this paper, we extend a leading motif discovery algorithm by allowing it to operate on multidimensional sensor data, incorporating automatic parameter estimation, and providing for motif-specific similarity adaptation. We evaluate our algorithm on several data sets and show how our approach leads to faster real world discovery and more accurate motifs compared to other leading methods.

IJCAI Conference 2007 Conference Paper

  • Peng Zang
  • Charles Isbell

We present MBoost, a novel extension to AdaBoost that extends boosting to use multiple weak learners explicitly, and provides robustness to learning models that overfit or are poorly matched to data. We demonstrate MBoost on a variety of problems and compare it to cross validation for model selection.

IJCAI Conference 2007 Conference Paper

  • Manu Sharma
  • Michael Holmes
  • Juan Santamaria
  • Arya Irani
  • Charles Isbell
  • Ashwin Ram

The goal of transfer learning is to use the knowledge acquired in a set of source tasks to improve performance in a related but previously unseen target task. In this paper, we present a multilayered architecture named CAse-Based Reinforcement Learner (CARL). It uses a novel combination of Case-Based Reasoning (CBR) and Reinforcement Learning (RL) to achieve transfer while playing against the Game AI across a variety of scenarios in MadRTS(TM), a commercial Real Time Strategy game. Our experiments demonstrate that CARL not only performs well on individual tasks but also exhibits significant performance gains when allowed to transfer knowledge from previous tasks.

NeurIPS Conference 2007 Conference Paper

Ultrafast Monte Carlo for Statistical Summations

  • Charles Isbell
  • Michael Holmes
  • Alexander Gray

Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Kernel estimators and other methods are burdened by these expensive computations. Exact evaluation is typically O(n2 ) or higher, which severely limits application to large datasets. We present a multi-stage stratified Monte Carlo method for approximating such summations with probabilistic relative error control. The essential idea is fast approximation by sampling in trees. This method differs from many previous scalability techniques (such as standard multi-tree methods) in that its error is stochastic, but we derive conditions for error control and demonstrate that they work. Further, we give a theoretical sample complexity for the method that is independent of dataset size, and show that this appears to hold in experiments, where speedups reach as high as 1014, many orders of magnitude beyond the previous state of the art.

NeurIPS Conference 2002 Conference Paper

Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch

  • Lawrence Saul
  • Daniel Lee
  • Charles Isbell
  • Yann Cun

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The al- gorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high res- olution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit. The pitch tracker is used in two real time multimedia applica- tions: a voice-to-MIDI player that synthesizes electronic music from vo- calized melodies, and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user’s pitch scrolling across the screen as he or she sings into the computer.

NeurIPS Conference 2001 Conference Paper

Cobot: A Social Reinforcement Learning Agent

  • Charles Isbell
  • Christian Shelton

We report on the use of reinforcement learning with Cobot, a software agent residing in the well-known online community LambdaMOO. Our initial work on Cobot (Isbell et al. 2000) provided him with the ability to collect social statistics and report them to users. Here we describe an application of RL allowing Cobot to take proactive actions in this complex social environment, and adapt behavior from multiple sources of human reward. After 5 months of training, and 3171 reward and punishment events from 254 different LambdaMOO users, Cobot learned nontrivial preferences for a number of users, modifing his behavior based on his current state. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment.

NeurIPS Conference 1999 Conference Paper

The Parallel Problems Server: an Interactive Tool for Large Scale Machine Learning

  • Charles Isbell
  • Parry Husbands

Imagine that you wish to classify data consisting of tens of thousands of ex(cid: 173) amples residing in a twenty thousand dimensional space. How can one ap(cid: 173) ply standard machine learning algorithms? We describe the Parallel Prob(cid: 173) lems Server (PPServer) and MATLAB*P. In tandem they allow users of networked computers to work transparently on large data sets from within Matlab. This work is motivated by the desire to bring the many benefits of scientific computing algorithms and computational power to machine learning researchers. We demonstrate the usefulness of the system on a number of tasks. For example, we perform independent components analysis on very large text corpora consisting of tens of thousands of documents, making minimal changes to the original Bell and Sejnowski Matlab source (Bell and Se(cid: 173) jnowski, 1995). Applying ML techniques to data previously beyond their reach leads to interesting analyses of both data and algorithms.

NeurIPS Conference 1998 Conference Paper

Restructuring Sparse High Dimensional Data for Effective Retrieval

  • Charles Isbell
  • Paul Viola

The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vector-a measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a tech(cid: 173) nique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational com(cid: 173) plexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse represen(cid: 173) tations constructed from sets of highly-related words. Documents and queries are represented by their distance to these sets, and relevance is measured by the number of common clusters. This technique significantly improves retrieval per(cid: 173) formance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.

NeurIPS Conference 1996 Conference Paper

MIMIC: Finding Optima by Estimating Probability Densities

  • Jeremy De Bonet
  • Charles Isbell
  • Paul Viola

In many optimization problems, the structure of solutions reflects complex relationships between the different input parameters. For example, experience may tell us that certain parameters are closely related and should not be explored independently. Similarly, ex(cid: 173) perience may establish that a subset of parameters must take on particular values. Any search of the cost landscape should take advantage of these relationships. We present MIMIC, a framework in which we analyze the global structure of the optimization land(cid: 173) scape. A novel and efficient algorithm for the estimation of this structure is derived. We use knowledge of this structure to guide a randomized search through the solution space and, in turn, to re(cid: 173) fine our estimate ofthe structure. Our technique obtains significant speed gains over other randomized optimization procedures.