OFFER: Off-Environment Reinforcement Learning

Kamil Ciosek; Shimon Whiteson

Back to AAAI

AAAI 2017

OFFER: Off-Environment Reinforcement Learning

Conference Paper Machine Learning Methods Artificial Intelligence

PDF Details

Abstract

Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables – state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing signiﬁcant rare events (SREs), e. g. , unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.

OFFER: Off-Environment Reinforcement Learning

Abstract

Authors

Keywords

Context