Arrow Research search
Back to IJCAI

IJCAI 2007

Conference Paper Uncertainty Artificial Intelligence

Abstract

A key problem in reinforcement learning is finding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much research has been devoted to this topic, and many of the proposed methods are aimed simply at ensuring that enough samples are gathered to estimate the value function well. In contrast, in 1959 Bellman proposed constructing a representation in which the states of the original system are paired with knowledge about the current model. Hence, knowledge about the possible Markov models of the environment is represented and maintained explicitly. Unfortunately, this approach is intractable except for bandit problems (where it gives rise to Gittins indices, an optimal exploration method). In this paper, we explore ideas for making this method computationally tractable. We maintain a model of the environment as a Markov Decision Process. We sample finite-length trajectories from the infinite tree using ideas based on sparse sampling. Finding the values of the nodes of this sparse subtree can then be expressed as an optimization problem, which we solve using Linear Programming. We illustrate this approach on a few domains and compare it with other exploration algorithms.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
International Joint Conference on Artificial Intelligence
Archive span
1969-2025
Indexed papers
14525
Paper id
383676754368897262