Learning Explainable and Better Performing Representations of POMDP Strategies

Debraj Chakraborty

Highlights 2024

Learning Explainable and Better Performing Representations of POMDP Strategies

Conference Abstract 14h00-15h03 Session 8: Probabilistic Systems Logic in Computer Science · Theoretical Computer Science

Details

Abstract

Partially Observable Markov Decision Processes (POMDPs), which combine non-determinism, probability and partial observability, have gained popularity in various applications as a model of planning in an unsafe and only partially observable environment. Unfortunately, typical objectives of interest such as reachability or total reward already result in undecidable problems. Finding a strategy cannot be done algorithmically while guaranteeing optimality w. r. t. the objective. Consequently, heuristics to synthesize practically well-performing strategies became of significant interest. We design a highly scalable postprocessing step, which improves the quality and the representation of the strategy. Our procedure learns a compact representation of the given strategy as a Mealy machine using automata-learning techniques. First, through learning the complete strategy, we get its automaton representation, which is fully equivalent and thus achieves the same value. Second, we provide heuristics learning small modifications of the strategy, specifically when the strategy is not defined, and also when it explicitly states that it is unsure about its choice (such as at the cut-off points). The talk is based on the same-titled paper at TACAS, 2024. This is a joint work with Alexander Bork, Kush Grover, Jan Křetı́nský and Stefanie Mohr.

Authors

Debraj Chakraborty

Keywords

No keywords are indexed for this paper.

Context

Venue: Highlights of Logic, Games and Automata
Archive span: 2013-2025
Indexed papers: 1236
Paper id: 1055979224751769874