Highlights 2024
Learning Explainable and Better Performing Representations of POMDP Strategies
Abstract
Partially Observable Markov Decision Processes (POMDPs), which combine non-determinism, probability and partial observability, have gained popularity in various applications as a model of planning in an unsafe and only partially observable environment. Unfortunately, typical objectives of interest such as reachability or total reward already result in undecidable problems. Finding a strategy cannot be done algorithmically while guaranteeing optimality w. r. t. the objective. Consequently, heuristics to synthesize practically well-performing strategies became of significant interest. We design a highly scalable postprocessing step, which improves the quality and the representation of the strategy. Our procedure learns a compact representation of the given strategy as a Mealy machine using automata-learning techniques. First, through learning the complete strategy, we get its automaton representation, which is fully equivalent and thus achieves the same value. Second, we provide heuristics learning small modifications of the strategy, specifically when the strategy is not defined, and also when it explicitly states that it is unsure about its choice (such as at the cut-off points). The talk is based on the same-titled paper at TACAS, 2024. This is a joint work with Alexander Bork, Kush Grover, Jan Křetı́nský and Stefanie Mohr.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- Highlights of Logic, Games and Automata
- Archive span
- 2013-2025
- Indexed papers
- 1236
- Paper id
- 1055979224751769874