RLDM Conference 2015 Conference Abstract
- Halit Suay
- Sonia Chernova
- Tim Brys
- Vrije Universiteit Brussel
- Matthew Taylor
Potential-based reward shaping is a theoretically sound way of incorporating prior knowledge in a reinforcement learning setting. While providing flexibility for choosing the potential function, this method guarantees the convergence of the final policy, regardless of the properties of the potential function. How- ever, this flexibility of choice, may cause confusion when making a design decision for a specific domain, as the number of possible candidates for a potential function can be overwhelming. Moreover, the poten- tial function either can be manually designed, to bias the behavior of the learner, or can be recovered from prior knowledge, e. g. from human demonstrations. In this paper we investigate the efficacy of two different ways for using a potential function recovered from human demonstrations. First approach uses a mixture of Gaussian distributions generated by samples collected during demonstrations (Gaussian-Shaping), and the second approach uses a reward function recovered from demonstrations with Relative Entropy Inverse Re- inforcement Learning (RE-IRL-Shaping). We present our findings in Cart-Pole, Mountain Car, and Puddle World domains. Our results show that Gaussian-Shaping can provide an efficient reward heuristic, acceler- ating learning through its ability to capture local information, and RE-IRL-Shaping can be more resilient to bad demonstrations. We report a brief analysis of our findings and we aim to provide a future reference for reinforcement learning agent designers, who consider using reward shaping by human demonstrations.