Batch Reinforcement Learning Through Continuation Method

Yijie Guo; Shengyu Feng; Nicolas Le Roux; Ed H. Chi; Honglak Lee; Minmin Chen

Back to ICLR

ICLR 2021

Batch Reinforcement Learning Through Continuation Method

Conference Paper Poster Presentations Artificial Intelligence · Machine Learning

Details

Abstract

Many real-world applications of reinforcement learning (RL) require the agent to learn from a fixed set of trajectories, without collecting new interactions. Policy optimization under this setting is extremely challenging as: 1) the geometry of the objective function is hard to optimize efficiently; 2) the shift of data distributions causes high noise in the value estimation. In this work, we propose a simple yet effective policy iteration approach to batch RL using global optimization techniques known as continuation. By constraining the difference between the learned policy and the behavior policy that generates the fixed trajectories, and continuously relaxing the constraint, our method 1) helps the agent escape local optima; 2) reduces the error in policy evaluation in the optimization procedure. We present results on a variety of control tasks, game environments, and a recommendation task to empirically demonstrate the efficacy of our proposed method.

Authors

Keywords

batch reinforcement learning
continuation method
relaxed regularization

Context

Venue: International Conference on Learning Representations
Archive span: 2013-2025
Indexed papers: 10294
Paper id: 1086645516085573344