Arrow Research search
Back to RLDM

RLDM 2019

Multi-batch Reinforcement Learning

Conference Abstract Accepted abstract Artificial Intelligence · Decision Making · Machine Learning · Reinforcement Learning

Abstract

We consider the problem of Reinforcement Learning (RL) in a multi-batch setting, also some- times called growing-batch setting. It consists in successive rounds: at each round, a batch of data is collected with a fixed policy, then the policy may be updated for the next round. In comparison with the more classical online setting, one cannot afford to train and use a bad policy and therefore exploration must be carefully controlled. This is even more dramatic when the batch size is indexed on the past policies performance. In comparison with the mono-batch setting, also called offline setting, one should not be too conservative and keep some form of exploration because it may compromise the asymptotic convergence to an optimal policy. In this article, we investigate the desired properties of RL algorithms in the multi-batch setting. Under some minimal assumptions, we show that the population of subjects either depletes or grows geometrically over time. This allows us to characterize conditions under which a safe policy update is pre- ferred, and those conditions may be assessed in-between batches. We conclude the paper by advocating the benefits of using a portfolio of policies, to better control the desired amount of risk.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Multidisciplinary Conference on Reinforcement Learning and Decision Making
Archive span
2013-2025
Indexed papers
1004
Paper id
210489178531142713