Tradeoffs Between Expectation and Variance in Weighted Markov Decision Processes

Jakob Piribauer

Back to Highlights

Highlights 2022

Tradeoffs Between Expectation and Variance in Weighted Markov Decision Processes

Conference Abstract Program Logic in Computer Science · Theoretical Computer Science

Details

Abstract

Many verification problems for probabilistic systems address optimal expected values of incurred costs or received rewards. In Markov decision processes (MDPs), a standard operational model combining non-deterministic and probabilistic transitions, these verification questions frequently result in stochastic shortest path problems where the task is to find a scheduler that optimizes the expected value of the amount of weight accumulated before reaching a target state. Other aspects besides the expectation of the resulting probability distribution of the accumulated weights are completely disregarded. In many application areas, however, the uncertainty coming with the probabilistic behavior cannot be neglected. In traffic control systems or energy grids, for example, large variability in the throughput comes at a high cost due to the risk of traffic jams or the difficulty of storing surplus energy. Also a probabilistic program employed in a complex environment might be of more use with a higher expected termination time in exchange for a lower chance of extreme termination times. A standard measure to quantify probabilistic uncertainty is the variance. This talk will address problems arising in finding a good tradeoff between variance and expectation in the stochastic shortest path problem. One way to achieve different tradeoffs is to consider variance-penalized expectations where a multiple of the variance is incurred as a penalty on the obtained expectation. Varying the penalty factor leads to different tradeoffs. We show that variance-penalized expectations in MDPs with non-negative weights can be computed in exponential space and that the corresponding threshold problem whether the optimum exceeds a given rational is in NEXPTIME and EXPTIME-hard. In addition, optimal schedulers can be chosen to be deterministic finite-memory schedulers. Furthermore, the talk will indicate difficulties arising in the question whether there is a scheduler that achieves at least an expectation of e while having variance at most v for given rationals e and v: Schedulers using memory and randomization are necessary here in general. In addition, when plotting possible combinations of expectation and variance that can be achieved by schedulers in an MDP, the resulting feasible region is not convex and the border is composed of parabolic line segments. This indicates a significant contrast to many other problems on weighted MDPs. The talk is based on ongoing joint work with Ocan Sankur and Christel Baier.

Authors

Jakob Piribauer Technische Universität Dresden Universität Leipzig

Keywords

No keywords are indexed for this paper.

Context

Venue: Highlights of Logic, Games and Automata
Archive span: 2013-2025
Indexed papers: 1236
Paper id: 269733187797954675