RLDM 2017
Algorithm selection of reinforcement learning algorithms
Abstract
Dialogue systems rely on a careful reinforcement learning (RL) design: the learning algorithm and its state space representation. In lack of more rigorous knowledge, the designer resorts to its practical experience to choose the best option. In order to automate and to improve the performance of the aforemen- tioned process, this article tackles the problem of online RL algorithm selection. A meta-algorithm is given for input a portfolio constituted of several off-policy RL algorithms. It then determines at the beginning of each new trajectory, which algorithm in the portfolio is in control of the behaviour during the next trajectory, in order to maximise the return. The article presents a novel meta-algorithm, called Epochal Stochastic Ban- dit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. The algorithm comes with theoretical guarantees and proves to be practically efficient on a simulated dialogue task, even outperforming the best algorithm in the portfolio in most settings.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- Multidisciplinary Conference on Reinforcement Learning and Decision Making
- Archive span
- 2013-2025
- Indexed papers
- 1004
- Paper id
- 303569301897255491