Pessimistic Iterative Planning for Robust POMDPs

Maris F. L. Galesloot; Marnix Suilen; Thiago D. Simão; Steven Carr; Matthijs T. J. Spaan; Ufuk Topcu; Nils Jansen

Back to EWRL

EWRL 2024

Pessimistic Iterative Planning for Robust POMDPs

Workshop Paper EWRL17 Artificial Intelligence · Machine Learning · Reinforcement Learning

PDF

Abstract

Robust partially observable Markov decision processes (robust POMDPs) extend classical POMDPs to handle additional uncertainty on the transition and observation probabilities via so-called uncertainty sets. Policies for robust POMDPs must not only be memory-based to account for partial observability but also robust against model uncertainty to account for the worst-case instances from the uncertainty sets. We propose the pessimistic iterative planning (PIP) framework, which finds robust memory-based policies for robust POMDPs. PIP alternates between two main steps: (1) selecting an adversarial (non-robust) POMDP via worst-case probability instances from the uncertainty sets; and (2) computing a finite-state controller (FSC) for this adversarial POMDP. We evaluate the performance of this FSC on the original robust POMDP and use this evaluation in step (1) to select the next adversarial POMDP. Within PIP, we propose the rFSCNet algorithm. In each iteration, rFSCNet finds an FSC through a recurrent neural network by using supervision policies optimized for the adversarial POMDP. The empirical evaluation in four benchmark environments showcases improved robustness against several baseline methods and competitive performance compared to a state-of-the-art robust POMDP solver.

Authors

Keywords

Finite-state controllers
Model-Based RL
Planning
Recurrent neural networks
Robust POMDPs

Context

Venue: European Workshop on Reinforcement Learning
Archive span: 2008-2025
Indexed papers: 649
Paper id: 939686179981482848