Author name cluster

Todd Hester

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

ICLR Conference 2020 Conference Paper

Robust Reinforcement Learning for Continuous Control with Model Misspecification

Daniel J. Mankowitz
Nir Levine
Rae Jeong
Abbas Abdolmaleki
Jost Tobias Springenberg
Yuanyuan Shi
Jackie Kay
Todd Hester

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case, entropy-regularized, expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a challenging, simulated, dexterous robotic hand. Finally, we present multiple investigative experiments that provide a deeper insight into the robustness framework; including an adaptation to another continuous control RL algorithm. Performance videos can be found online at https://sites.google.com/view/robust-rl.

Details

ICRA Conference 2019 Conference Paper

A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning

Mel Vecerík
Oleg Sushkov
David Barker
Thomas Rothörl
Todd Hester
Jonathan Scholz

Insertion is a challenging haptic and visual control problem with significant practical value for manufacturing. Existing approaches in the model-based robotics community can be highly effective when task geometry is known, but are complex and cumbersome to implement, and must be tailored to each individual problem by a qualified engineer. Within the learning community there is a long history of insertion research, but existing approaches are either too sample-inefficient to run on real robots, or assume access to high-level object features, e. g. socket pose. In this paper we show that relatively minor modifications to an off-the-shelf Deep-RL algorithm (DDPG), combined with a small number of human demonstrations, allows the robot to quickly learn to solve these tasks efficiently and robustly. Our approach requires no modeling or simulation, no parameterized search or alignment behaviors, no vision system aside from raw images, and no reward shaping. We evaluate our approach on a narrow-clearance peg-insertion task and a deformable clip-insertion task, both of which include variability in the socket position. Our results show that these tasks can be solved reliably on the real robot in less than 10 minutes of interaction time, and that the resulting policies are robust to variance in the socket position and orientation.

Details

AAAI Conference 2018 Conference Paper

Deep Q-learning From Demonstrations

Todd Hester
Matej Vecerik
Olivier Pietquin
Marc Lanctot
Tom Schaul
Bilal Piot
Dan Horgan
John Quan

Deep reinforcement learning (RL) has achieved several high proﬁle successes in difﬁcult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classiﬁcation of the demonstrator’s actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the ﬁrst million steps on 41 of 42 games and on average it takes PDD DQN 83 million steps to catch up to DQfD’s performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-of-the-art results for 11 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN.

PDF Details

AIJ Journal 2017 Journal Article

Intrinsically motivated model learning for developing curious robots

Todd Hester
Peter Stone

Details DOI

ICRA Conference 2012 Conference Paper

RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control

Todd Hester
Michael J. Quinlan
Peter Stone 0001

Reinforcement Learning (RL) is a paradigm for learning decision-making tasks that could enable robots to learn and adapt to their situation on-line. For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. Existing model-based RL methods learn in relatively few samples, but typically take too much time between each action for practical on-line learning. In this paper, we present a novel parallel architecture for model-based RL that runs in real-time by 1) taking advantage of sample-based approximate planning methods and 2) parallelizing the acting, model learning, and planning processes in a novel way such that the acting process is sufficiently fast for typical robot control cycles. We demonstrate that algorithms using this architecture perform nearly as well as methods using the typical sequential architecture when both are given unlimited time, and greatly out-perform these methods on tasks that require real-time actions such as controlling an autonomous vehicle.

Details

ICRA Conference 2010 Conference Paper

Generalized model learning for Reinforcement Learning on a humanoid robot

Todd Hester
Michael J. Quinlan
Peter Stone 0001

Reinforcement learning (RL) algorithms have long been promising methods for enabling an autonomous robot to improve its behavior on sequential decision-making tasks. The obvious enticement is that the robot should be able to improve its own behavior without the need for detailed step-by-step programming. However, for RL to reach its full potential, the algorithms must be sample efficient: they must learn competent behavior from very few real-world trials. From this perspective, model-based methods, which use experiential data more efficiently than model-free approaches, are appealing. But they often require exhaustive exploration to learn an accurate model of the domain. In this paper, we present an algorithm, Reinforcement Learning with Decision Trees (RL-DT), that uses decision trees to learn the model by generalizing the relative effect of actions across states. The agent explores the environment until it believes it has a reasonable policy. The combination of the learning approach with the targeted exploration policy enables fast learning of the model. We compare RL-DT against standard model-free and model-based learning methods, and demonstrate its effectiveness on an Aldebaran Nao humanoid robot scoring goals in a penalty kick scenario.

Details

AAMAS Conference 2009 Conference Paper

Generalized Model Learning for Reinforcement Learning in Factored Domains

Todd Hester
Peter Stone

Improving the sample eﬃciency of reinforcement learning algorithms to scale up to larger and more realistic domains is a current research challenge in machine learning. Model-based methods use experiential data more eﬃciently than modelfree approaches but often require exhaustive exploration to learn an accurate model of the domain. We present an algorithm, Reinforcement Learning with Decision Trees (rl-dt), that uses supervised learning techniques to learn the model by generalizing the relative eﬀect of actions across states. Speciﬁcally, rl-dt uses decision trees to model the relative eﬀects of actions in the domain. The agent explores the environment exhaustively in early episodes when its model is inaccurate. Once it believes it has developed an accurate model, it exploits its model, taking the optimal action at each step. The combination of the learning approach with the targeted exploration policy enables fast learning of the model. The sample eﬃciency of the algorithm is evaluated empirically in comparison to ﬁve other algorithms across three domains. rl-dt consistently accrues high cumulative rewards in comparison with the other algorithms tested.

PDF

ICRA Conference 2008 Conference Paper

Negative information and line observations for Monte Carlo localization

Todd Hester
Peter Stone 0001

Localization is a very important problem in robotics and is critical to many tasks performed on a mobile robot. In order to localize well in environments with few landmarks, a robot must make full use of all the information provided to it. This paper moves towards this goal by studying the effects of incorporating line observations and negative information into the localization algorithm. We extend the general Monte Carlo localization algorithm to utilize observations of lines such as carpet edges. We also make use of the information available when the robot expects to see a landmark but does not, by incorporating negative information into the algorithm. We compare our implementations of these ideas to previous similar approaches and demonstrate the effectiveness of these improvements through localization experiments performed both on a Sony AIBO ERS-7 robot and in simulation.

Details

AAMAS Conference 2008 Conference Paper

The Utility of Temporal Abstraction in Reinforcement Learning

Nicholas Jong
Todd Hester
Peter Stone

The hierarchical structure of real-world problems has motivated extensive research into temporal abstractions for reinforcement learning, but precisely how these abstractions allow agents to improve their learning performance is not well understood. This paper investigates the connection between temporal abstraction and an agent’s exploration policy, which determines how the agent’s performance improves over time. Experimental results with standard methods for incorporating temporal abstractions show that these methods benefit learning only in limited contexts. The primary contribution of this paper is a clearer understanding of how hierarchical decompositions interact with reinforcement learning algorithms, with important consequences for the manual design or automatic discovery of action hierarchies.

PDF