Learning compound multi-step controllers under unknown dynamics

Weiqiao Han; Sergey Levine; Pieter Abbeel

Back to IROS

IROS 2015

Learning compound multi-step controllers under unknown dynamics

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Details

Abstract

Applications of reinforcement learning for robotic manipulation often assume an episodic setting. However, controllers trained with reinforcement learning are often situated in the context of a more complex compound task, where multiple controllers might be invoked in sequence to accomplish a higher-level goal. Furthermore, training such controllers typically requires resetting the environment between episodes, which is typically handled manually. We describe an approach for training chains of controllers with reinforcement learning. This requires taking into account the state distributions induced by preceding controllers in the chain, as well as automatically training reset controllers that can reset the task between episodes. The initial state of each controller is determined by the controller that precedes it, resulting in a non-stationary learning problem. We demonstrate that a recently developed method that optimizes linear-Gaussian controllers under learned local linear models can tackle this sort of non-stationary problem, and that training controllers concurrently with a corresponding reset controller only minimally increases training time. We also demonstrate this method on a complex tool use task that consists of seven stages and requires using a toy wrench to screw in a bolt. This compound task requires grasping and handling complex contact dynamics. After training, the controllers can execute the entire task quickly and efficiently. Finally, we show that this method can be combined with guided policy search to automatically train nonlinear neural network controllers for a grasping task with considerable variation in target position.

Authors

Keywords

Heuristic algorithms
Training
Learning (artificial intelligence)
Robots
Compounds
Trajectory
Neural networks
Unknown Dynamics
Neural Network
Variable Positions
Robot Manipulator
Nonlinear Network
Neural Network Control
Application Of Reinforcement Learning
Policy Search
Nonlinear Neural Networks
Learning Algorithms
Dynamic Model
Cost Function
Optimal Control
Brownian Motion
Variety Of Tasks
Control Sequence
Gaussian Mixture Model
Manipulation Tasks
Single Control
Model-based Reinforcement Learning
Point In Frame
Constrained Problem
Learning Control
Problem For Equation
Constrained Optimization
Distribution Of Trajectories
Time-varying Dynamics
Linear Dynamics
Time-varying Control

Context

Venue: IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span: 1988-2025
Indexed papers: 26578
Paper id: 1119547750969647331