Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin; Jianda Chen; Sinno Jialin Pan; Sebastian Tschiatschek

Back to AAAI

AAAI 2021

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Conference Paper AAAI Technical Track on Machine Learning V Artificial Intelligence

PDF Details

Abstract

Many challenging partially observable reinforcement learning problems have sparse rewards and most existing model-free algorithms struggle with such reward sparsity. In this paper, we propose a novel reward shaping approach to infer the intrinsic rewards for the agent from a sequential generative model. Specifically, the sequential generative model processes a sequence of partial observations and actions from the agent’s historical transitions to compile a belief state for performing forward dynamics prediction. Then we utilize the error of the dynamics prediction task to infer the intrinsic rewards for the agent. Our proposed method is able to derive intrinsic rewards that could better reflect the agent’s surprise or curiosity over its ground-truth state by taking a sequential inference procedure. Furthermore, we formulate the inference procedure for dynamics prediction as a multi-step forward prediction task, where the time abstraction that has been incorporated could effectively help to increase the expressiveness of the intrinsic reward signals. To evaluate our method, we conduct extensive experiments on challenging 3D navigation tasks in ViZDoom and DeepMind Lab. Empirical evaluation results show that our proposed exploration method could lead to significantly faster convergence than various state-of-the-art exploration approaches in the testified navigation domains.

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Abstract

Authors

Keywords

Context