Arrow Research search
Back to EWRL

EWRL 2016

Robust Kalman Temporal Difference

Workshop Paper Accepted Paper Artificial Intelligence · Machine Learning · Reinforcement Learning

Abstract

We propose an on-line algorithm for policy evaluation in large scale Robust Markov Decision Processes (RMDPs) with uncertainty in the transition probabilities. Our approach is based on the Kalman-Temporal Difference (KTD) formulation, supporting linear and non-linear approximations and considering minimal conditions on the uncertain transition probabilities. Previous work deals with robustness using dynamic programming (DP) and approximate dynamic programming (ADP) methods for both small and large state spaces. These methods can be used only in an off-line setting that requires full trajectories information. In large scale state spaces, the convergence proof is based on a restricted assumption regarding the uncertainty set and only linear value function approximation is considered. Our approach overcomes these limitations by using the Kalman filter framework for on-line estimation and considering the robust Bellman equation as an observation function. We present the Robust-KTD algorithm, analyze its convergence and examine its performance.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
European Workshop on Reinforcement Learning
Archive span
2008-2025
Indexed papers
649
Paper id
1146723843975062260