(Non-Parametric) Bayesian Linear Value Function Approximation

Andras Kupcsik; Gerhard Neumann

RLDM 2015

(Non-Parametric) Bayesian Linear Value Function Approximation

Conference Abstract Accepted abstract Artificial Intelligence · Decision Making · Machine Learning · Reinforcement Learning

PDF Details

Abstract

Least Squares Temporal Difference (LSTD) is a popular approach to evaluate value functions for a given policy. The goal of LSTD and other related methods is to find a linear approximation of the (action-)value function in feature space that is consistent with the Bellman equation. While standard tem- poral difference learning accounts for stochasticity in dynamics and in the policy, the (action-)values are mostly considered as deterministic variables. However, in many scenarios, we also want to obtain variances of the values, where the variance might depend on the stochasticity of the model, the stochasticity of the reward function and the uncertainty of the parameters of the value function due to a limited set of sampled transitions. In this paper, we are proposing a novel Bayesian approach to approximate action-value func- tions for policy evaluation in Reinforcement Learning (RL) tasks. First, we show how projectors can be used to estimate the conditional expectation in the Bellman equation using sample features, and how we can find the regularized optimal solution to minimize the mean squared Bellman error. Subsequently, we turn to the Bayesian treatment of this approach and show that the solution exists in closed form. We also present a kernelized version of our approach, the Value Function Process, leveraging the idea of Gaussian Process regression. Both approaches can be used to obtain the variances of the values by marginalizing the parameters of the value. We show that existing LSTD variants are a special case of our new formulation and present preliminary simulation results in a state chain navigation task that show the superior performance of our approach.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: Multidisciplinary Conference on Reinforcement Learning and Decision Making
Archive span: 2013-2025
Indexed papers: 1004
Paper id: 93057269779729539