RLDM Conference 2019 Conference Abstract
A Bayesian Approach to Robust Reinforcement Learning
- Esther Derman
- Daniel Mankowitz
- Timothy Arthur Mann
In sequential decision-making problems, Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty set and a robust optimal policy can be derived under the worst-case scenario. However, in practice, the uncertainty set is unknown and must be constructed based on available data. Most existing approaches to robust reinforcement learning (RL) build the uncertainty set upon a fixed batch of data before solving the resulting planning problem. Since the agent does not change its uncertainty set despite new observations, it may be overly conservative by not taking advantage of more favorable scenarios. Another drawback of these approaches is that building the uncertainty set is computationally inefficient, which prevents scaling up online learning of robust policies. In this study, we address the issue of learning in RMDPs using a Bayesian approach. We introduce the Uncertainty Robust Bellman Equation (URBE) which encourages exploration for adapting the uncertainty set to new observations while preserving robustness. We propose a URBE-based algorithm, DQN-URBE, that scales this method to higher dimensional domains. Our experiments show that the derived URBE-based strategy leads to a better trade-off between less conservative solutions and robustness in the presence of model misspecification. In addition, we show that the DQN-URBE algorithm can adapt significantly faster to changing dynamics online compared to existing robust techniques with fixed uncertainty sets.