Arrow Research search
Back to NeurIPS

NeurIPS 1994

Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems

Conference Paper Artificial Intelligence ยท Machine Learning

Abstract

Increasing attention has been paid to reinforcement learning algo(cid: 173) rithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo pol(cid: 173) icy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum. The algorithm operates in the space of stochastic policies, a space which can yield a policy that per(cid: 173) forms considerably better than any deterministic policy. Although the space of stochastic policies is continuous-even for a discrete action space-our algorithm is computationally tractable. 346 Tommi Jaakkola, Satinder P. Singh, Michaell. Jordan

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Annual Conference on Neural Information Processing Systems
Archive span
1987-2025
Indexed papers
30776
Paper id
526434724322442194