RLDM 2013
Dirichlet Process Reinforcement Learning
Abstract
We consider the problem of model-based reinforcement learning in continuous state-action spaces. The key ingredients of our algorithm are: (i) To model the (initially unknown) dynamics our algorithm uses a Dirichlet process mixture of linear models. We present a novel method for on-line inference with this model which enables to learn continuously at high frequency. (ii) To address the exploration-exploitation trade-off, we describe how to adapt BOLT [1], an established method for discrete systems, to make it prac- tical for continuous systems. (iii) Efficient control is possible by relying on recent advances in sequential quadratic programming (SQP). Our algorithm is highly automated, requiring only two scalar parameters, and it is designed for parallel computation. Experiments show that it can solve the classical cartpole and under-actuated pendulum swing-up tasks as well as a new helicopter 180-degree flip followed by inverted hover task with minimal re-tuning of these two parameters for each new system.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- Multidisciplinary Conference on Reinforcement Learning and Decision Making
- Archive span
- 2013-2025
- Indexed papers
- 1004
- Paper id
- 1092451110698461019