Abstract

Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behaviour of an expert. Most of the existing algorithms for IRL assume that the expert’s environment is modeled as a Markov decision process (MDP), although they should be able to handle partially observable settings in order to widen the applicability to more realistic scenarios. In this paper, we present an extension of the classical IRL algorithm by Ng and Russell to partially observable environments. We discuss technical issues and challenges, and present the experimental results on some of the benchmark partially observable domains.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: International Joint Conference on Artificial Intelligence
Archive span: 1969-2025
Indexed papers: 14525
Paper id: 466777747120320844