Robust Kalman Temporal Difference

Shirli Di-Castro Shashua; Shie Mannor

Back to EWRL

EWRL 2016

Robust Kalman Temporal Difference

Workshop Paper Accepted Paper Artificial Intelligence · Machine Learning · Reinforcement Learning

PDF Details

Abstract

We propose an on-line algorithm for policy evaluation in large scale Robust Markov Decision Processes (RMDPs) with uncertainty in the transition probabilities. Our approach is based on the Kalman-Temporal Difference (KTD) formulation, supporting linear and non-linear approximations and considering minimal conditions on the uncertain transition probabilities. Previous work deals with robustness using dynamic programming (DP) and approximate dynamic programming (ADP) methods for both small and large state spaces. These methods can be used only in an off-line setting that requires full trajectories information. In large scale state spaces, the convergence proof is based on a restricted assumption regarding the uncertainty set and only linear value function approximation is considered. Our approach overcomes these limitations by using the Kalman filter framework for on-line estimation and considering the robust Bellman equation as an observation function. We present the Robust-KTD algorithm, analyze its convergence and examine its performance.

Robust Kalman Temporal Difference

Abstract

Authors

Keywords

Context