Arrow Research search

Author name cluster

Krishnendu Chatterjee

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

82 papers
2 author rows

Possible papers

82

AAAI Conference 2026 Conference Paper

Qualitative Analysis of ω-Regular Objectives on Robust MDPs

  • Ali Asadi
  • Krishnendu Chatterjee
  • Ehsan Kafshdar Goharshady
  • Mehrdad Karrabi
  • Ali Shafiee

Robust Markov Decision Processes (RMDPs) generalize classical MDPs that consider uncertainties in transition probabilities by defining a set of possible transition functions. An objective is a set of runs (or infinite trajectories) of the RMDP, and the value for an objective is the maximal probability that the agent can guarantee against the adversarial environment. We consider (a) reachability objectives, where given a target set of states, the goal is to eventually arrive at one of them; and (b) parity objectives, which are a canonical representation for ω-regular objectives. The qualitative analysis problem asks whether the objective can be ensured with probability 1. In this work, we study the qualitative problem for reachability and parity objectives on RMDPs without making any assumption over the structures of the RMDPs, e.g., unichain or aperiodic. Our contributions are twofold. We first present efficient algorithms with oracle access to uncertainty sets that solve qualitative problems of reachability and parity objectives. We then report experimental results demonstrating the effectiveness of our oracle-based approach on classical RMDP examples from the literature scaling up to thousands of states.

AAAI Conference 2026 Conference Paper

Revealing POMDPs: Qualitative and Quantitative Analysis for Parity Objectives

  • Ali Asadi
  • Krishnendu Chatterjee
  • David Lurie
  • Raimundo Saona

Partially observable Markov decision processes (POMDPs) are a central model for uncertainty in sequential decision making. The most basic objective is the reachability objective, where a target set must be eventually visited, and the more general parity objectives can model all omega-regular specifications. For such objectives, the computational analysis problems are the following: (a) qualitative analysis that asks whether the objective can be satisfied with probability 1 (almost-sure winning) or probability arbitrarily close to 1 (limit-sure winning); and (b) quantitative analysis that asks for the approximation of the optimal probability of satisfying the objective. For general POMDPs, almost-sure analysis for reachability objectives is EXPTIME-complete, but limit-sure and quantitative analyses for reachability objectives are undecidable; almost-sure, limit-sure, and quantitative analyses for parity objectives are all undecidable. A special class of POMDPs, called revealing POMDPs, has been studied recently in several works, and for this subclass the almost-sure analysis for parity objectives was shown to be EXPTIME-complete. In this work, we show that for revealing POMDPs the limit-sure analysis for parity objectives is EXPTIME-complete, and even the quantitative analysis for parity objectives can be achieved in EXPTIME.

UAI Conference 2025 Conference Paper

Limit-sure Reachability for Small Memory Policies in POMDPs is NP-complete

  • Ali Asadi
  • Krishnendu Chatterjee
  • Raimundo Saona
  • Ali Shafiee

A standard model that arises in several applications in sequential decision-making is partially observable Markov decision processes (POMDPs) where a decision-making agent interacts with an uncertain environment. A basic objective in POMDPs is the reachability objective, where given a target set of states, the goal is to eventually arrive at one of them. The limit-sure problem asks whether reachability can be ensured with probability arbitrarily close to 1. In general, the limit-sure reachability problem for POMDPs is undecidable. However, in many practical cases, the most relevant question is the existence of policies with a small amount of memory. In this work, we study the limit-sure reachability problem for POMDPs with a fixed amount of memory. We establish that the computational complexity of the problem is NP-complete.

AAAI Conference 2025 Conference Paper

Linear Equations with Min and Max Operators: Computational Complexity

  • Krishnendu Chatterjee
  • Ruichen Luo
  • Raimundo Saona
  • Jakub Svoboda

We consider a class of optimization problems defined by a system of linear equations with min and max operators. This class of optimization problems has been studied under restrictive conditions, such as, (C1) the halting or stability condition; (C2) the non-negative coefficients condition; (C3) the sum upto 1 condition; and (C4) the only min or only max operator condition. Several seminal results in the literature focus on special cases. For example, turn-based stochastic games correspond to conditions C2 and C3; and Markov decision process to conditions C2, C3, and C4. However, the systematic computational complexity study of all the cases has not been explored, which we address in this work. Some highlights of our results are: with conditions C2 and C4, and with conditions C3 and C4, the problem is NP-complete, whereas with condition C1 only, the problem is in UP intersects coUP. Finally, we establish the computational complexity of the decision problem of checking the respective conditions.

UAI Conference 2025 Conference Paper

Lower Bound on Howard Policy Iteration for Deterministic Markov Decision Processes

  • Ali Asadi
  • Krishnendu Chatterjee
  • Jakob de Raaij

Deterministic Markov Decision Processes (DMDPs) are a mathematical framework for decision-making where the outcomes and future possible actions are deterministically determined by the current action taken. DMDPs can be viewed as a finite directed weighted graph, where in each step, the controller chooses an outgoing edge. An objective is a measurable function on runs (or infinite trajectories) of the DMDP, and the value for an objective is the maximal cumulative reward (or weight) that the controller can guarantee. We consider the classical mean-payoff (aka limit-average) objective, which is a basic and fundamental objective. Howard’s policy iteration algorithm is a popular method for solving DMDPs with mean-payoff objectives. Although Howard’s algorithm performs well in practice, as experimental studies suggested, the best known upper bound is exponential and the current known lower bound is as follows: For the input size $I$, the algorithm requires $\widetilde{\Omega}(\sqrt{I})$ iterations, where $\widetilde{\Omega}$ hides the poly-logarithmic factors, i. e. , the current lower bound on iterations is sub-linear with respect to the input size. Our main result is an improved lower bound for this fundamental algorithm where we show that for the input size $I$, the algorithm requires $\widetilde{\Omega}(I)$ iterations.

AAAI Conference 2025 Conference Paper

Quantified Linear and Polynomial Arithmetic Satisfiability via Template-based Skolemization

  • Krishnendu Chatterjee
  • Ehsan Kafshdar Goharshady
  • Mehrdad Karrabi
  • Harshit J Motwani
  • Maximilian Seeliger
  • Đorđe Žikelić

The problem of checking satisfiability of linear real arithmetic (LRA) and non-linear real arithmetic (NRA) formulas has broad applications, in particular, they are at the heart of logic-related applications such as logic for artificial intelligence, program analysis, etc. While there has been much work on checking satisfiability of unquantified LRA and NRA formulas, the problem of checking satisfiability of quantified LRA and NRA formulas remains a significant challenge. The main bottleneck in the existing methods is a computationally expensive quantifier elimination step. In this work, we propose a novel method for efficient quantifier elimination in quantified LRA and NRA formulas. We propose a template-based Skolemization approach, where we automatically synthesize linear/polynomial Skolem functions in order to eliminate quantifiers in the formula. The key technical ingredient in our approach are Positivstellensätze theorems from algebraic geometry, which allow for an efficient manipulation of polynomial inequalities. Our method offers a range of appealing theoretical properties combined with a strong practical performance. On the theory side, our method is sound, semi-complete, and runs in subexponential time and polynomial space, as opposed to existing sound and complete quantifier elimination methods that run in doubly-exponential time and at least exponential space. On the practical side, our experiments show superior performance compared to state of the art SMT solvers in terms of the number of solved instances and runtime, both on LRA and on NRA benchmarks.

IJCAI Conference 2024 Conference Paper

Certified Policy Verification and Synthesis for MDPs under Distributional Reach-Avoidance Properties

  • S. Akshay
  • Krishnendu Chatterjee
  • Tobias Meggendorfer
  • Đorđe Žikelić

Markov Decision Processes (MDPs) are a classical model for decision making in the presence of uncertainty. Often they are viewed as state transformers with planning objectives defined with respect to paths over MDP states. An increasingly popular alternative is to view them as distribution transformers, giving rise to a sequence of probability distributions over MDP states. For instance, reachability and safety properties in modeling robot swarms or chemical reaction networks are naturally defined in terms of probability distributions over states. Verifying such distributional properties is known to be hard and often beyond the reach of classical state-based verification techniques. In this work, we consider the problems of certified policy (i. e. controller) verification and synthesis in MDPs under distributional reach-avoidance specifications. By certified we mean that, along with a policy, we also aim to synthesize a (checkable) certificate ensuring that the MDP indeed satisfies the property. Thus, given the target set of distributions and an unsafe set of distributions over MDP states, our goal is to either synthesize a certificate for a given policy or synthesize a policy along with a certificate, proving that the target distribution can be reached while avoiding unsafe distributions. To solve this problem, we introduce the novel notion of distributional reach-avoid certificates and present automated procedures for (1) synthesizing a certificate for a given policy, and (2) synthesizing a policy together with the certificate, both providing formal guarantees on certificate correctness. Our experimental evaluation demonstrates the ability of our method to solve several non-trivial examples, including a multi-agent robot-swarm model, to synthesize certified policies and to certify existing policies.

ICML Conference 2024 Conference Paper

Reinforcement Learning from Reachability Specifications: PAC Guarantees with Expected Conditional Distance

  • Jakub Svoboda
  • Suguman Bansal
  • Krishnendu Chatterjee

Reinforcement Learning (RL) from temporal logical specifications is a fundamental problem in sequential decision making. One of the basic and core such specification is the reachability specification that requires a target set to be eventually visited. Despite strong empirical results for RL from such specifications, the theoretical guarantees are bleak, including the impossibility of Probably Approximately Correct (PAC) guarantee for reachability specifications. Given the impossibility result, in this work we consider the problem of RL from reachability specifications along with the information of expected conditional distance (ECD). We present (a) lower bound results which establish the necessity of ECD information for PAC guarantees and (b) an algorithm that establishes PAC-guarantees given the ECD information. To the best of our knowledge, this is the first RL from reachability specifications that does not make any assumptions on the underlying environment to learn policies.

IJCAI Conference 2024 Conference Paper

Solving Long-run Average Reward Robust MDPs via Stochastic Games

  • Krishnendu Chatterjee
  • Ehsan Kafshdar Goharshady
  • Mehrdad Karrabi
  • Petr Novotný
  • Đorđe Žikelić

Markov decision processes (MDPs) provide a standard framework for sequential decision making under uncertainty. However, MDPs do not take uncertainty in transition probabilities into account. Robust Markov decision processes (RMDPs) address this shortcoming of MDPs by assigning to each transition an uncertainty set rather than a single probability value. In this work, we consider polytopic RMDPs in which all uncertainty sets are polytopes and study the problem of solving long-run average reward polytopic RMDPs. We present a novel perspective on this problem and show that it can be reduced to solving long-run average reward turn-based stochastic games with finite state and action spaces. This reduction allows us to derive several important consequences that were hitherto not known to hold for polytopic RMDPs. First, we derive new computational complexity bounds for solving long-run average reward polytopic RMDPs, showing for the first time that the threshold decision problem for them is in NP and coNP and that they admit a randomized algorithm with sub-exponential expected runtime. Second, we present Robust Polytopic Policy Iteration (RPPI), a novel policy iteration algorithm for solving long-run average reward polytopic RMDPs. Our experimental evaluation shows that RPPI is much more efficient in solving long-run average reward polytopic RMDPs compared to state-of-the-art methods based on value iteration.

NeurIPS Conference 2023 Conference Paper

Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

  • Đorđe Žikelić
  • Mathias Lechner
  • Abhinav Verma
  • Krishnendu Chatterjee
  • Thomas Henzinger

Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph’s sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.

MFCS Conference 2023 Conference Paper

Entropic Risk for Turn-Based Stochastic Games

  • Christel Baier
  • Krishnendu Chatterjee
  • Tobias Meggendorfer
  • Jakob Piribauer

Entropic risk (ERisk) is an established risk measure in finance, quantifying risk by an exponential re-weighting of rewards. We study ERisk for the first time in the context of turn-based stochastic games with the total reward objective. This gives rise to an objective function that demands the control of systems in a risk-averse manner. We show that the resulting games are determined and, in particular, admit optimal memoryless deterministic strategies. This contrasts risk measures that previously have been considered in the special case of Markov decision processes and that require randomization and/or memory. We provide several results on the decidability and the computational complexity of the threshold problem, i. e. whether the optimal value of ERisk exceeds a given threshold. In the most general case, the problem is decidable subject to Shanuel’s conjecture. If all inputs are rational, the resulting threshold problem can be solved using algebraic numbers, leading to decidability via a polynomial-time reduction to the existential theory of the reals. Further restrictions on the encoding of the input allow the solution of the threshold problem in NP∩coNP. Finally, an approximation algorithm for the optimal value of ERisk is provided.

SODA Conference 2023 Conference Paper

Faster Algorithm for Turn-based Stochastic Games with Bounded Treewidth

  • Krishnendu Chatterjee
  • Tobias Meggendorfer
  • Raimundo Saona
  • Jakub Svoboda

Turn-based stochastic games (aka simple stochastic games) are two-player zero-sum games played on directed graphs with probabilistic transitions. The goal of player-max is to maximize the probability to reach a target state against the adversarial player-min. These games lie in NP ∩ coNP and are among the rare combinatorial problems that belong to this complexity class for which the existence of polynomial-time algorithm is a major open question. While randomized sub-exponential time algorithm exists, all known deterministic algorithms require exponential time in the worst-case. An important open question has been whether faster algorithms can be obtained parametrized by the treewidth of the game graph. Even deterministic sub-exponential time algorithm for constant treewidth turn-based stochastic games has remain elusive. In this work our main result is a deterministic algorithm to solve turn-based stochastic games that, given a game with n states, treewidth at most t, and the bit-complexity of the probabilistic transition function log D, has running time O (( tn 2 log D ) t log n ). In particular, our algorithm is quasi-polynomial time for games with constant or poly-logarithmic treewidth.

AAAI Conference 2023 Conference Paper

Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees

  • Đorđe Žikelić
  • Mathias Lechner
  • Thomas A. Henzinger
  • Krishnendu Chatterjee

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p in [0,1] over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks.

AAAI Conference 2023 Conference Paper

Quantization-Aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks

  • Mathias Lechner
  • Đorđe Žikelić
  • Krishnendu Chatterjee
  • Thomas A. Henzinger
  • Daniela Rus

We study the problem of training and certifying adversarially robust quantized neural networks (QNNs). Quantization is a technique for making neural networks more efficient by running them using low-bit integer arithmetic and is therefore commonly adopted in industry. Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization, and certification of the quantized representation is necessary to guarantee robustness. In this work, we present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs. Inspired by advances in robust learning of non-quantized networks, our training algorithm computes the gradient of an abstract representation of the actual network. Unlike existing approaches, our method can handle the discrete semantics of QNNs. Based on QA-IBP, we also develop a complete verification procedure for verifying the adversarial robustness of QNNs, which is guaranteed to terminate and produce a correct answer. Compared to existing approaches, the key advantage of our verification procedure is that it runs entirely on GPU or other accelerator devices. We demonstrate experimentally that our approach significantly outperforms existing methods and establish the new state-of-the-art for training and certifying the robustness of QNNs.

AAAI Conference 2022 Conference Paper

Stability Verification in Stochastic Control Systems via Neural Network Supermartingales

  • Mathias Lechner
  • Đorđe Žikelić
  • Krishnendu Chatterjee
  • Thomas A. Henzinger

We consider the problem of formally verifying almost-sure (a. s.) asymptotic stability in discrete-time nonlinear stochastic control systems. While verifying stability in deterministic control systems is extensively studied in the literature, verifying stability in stochastic control systems is an open problem. The few existing works on this topic either consider only specialized forms of stochasticity or make restrictive assumptions on the system, rendering them inapplicable to learning algorithms with neural network policies. In this work, we present an approach for general nonlinear stochastic control problems with two novel aspects: (a) instead of classical stochastic extensions of Lyapunov functions, we use ranking supermartingales (RSMs) to certify a. s. asymptotic stability, and (b) we present a method for learning neural network RSMs. We prove that our approach guarantees a. s. asymptotic stability of the system and provides the first method to obtain bounds on the stabilization time, which stochastic Lyapunov functions do not. Finally, we validate our approach experimentally on a set of nonlinear stochastic reinforcement learning environments with neural network policies.

NeurIPS Conference 2021 Conference Paper

Infinite Time Horizon Safety of Bayesian Neural Networks

  • Mathias Lechner
  • Đorđe Žikelić
  • Krishnendu Chatterjee
  • Thomas Henzinger

Bayesian neural networks (BNNs) place distributions over the weights of a neural network to model uncertainty in the data and the network's prediction. We consider the problem of verifying safety when running a Bayesian neural network policy in a feedback loop with infinite time horizon systems. Compared to the existing sampling-based approaches, which are inapplicable to the infinite time horizon setting, we train a separate deterministic neural network that serves as an infinite time horizon safety certificate. In particular, we show that the certificate network guarantees the safety of the system over a subset of the BNN weight posterior's support. Our method first computes a safe weight set and then alters the BNN's weight posterior to reject samples outside this set. Moreover, we show how to extend our approach to a safe-exploration reinforcement learning setting, in order to avoid unsafe trajectories during the training of the policy. We evaluate our approach on a series of reinforcement learning benchmarks, including non-Lyapunovian safety specifications.

IJCAI Conference 2021 Conference Paper

Solving Partially Observable Stochastic Shortest-Path Games

  • Petr Tomášek
  • Karel Horák
  • Aditya Aradhye
  • Branislav Bošanský
  • Krishnendu Chatterjee

We study the two-player zero-sum extension of the partially observable stochastic shortest-path problem where one agent has only partial information about the environment. We formulate this problem as a partially observable stochastic game (POSG): given a set of target states and negative rewards for each transition, the player with imperfect information maximizes the expected undiscounted total reward until a target state is reached. The second player with the perfect information aims for the opposite. We base our formalism on POSGs with one-sided observability (OS-POSGs) and give the following contributions: (1) we introduce a novel heuristic search value iteration algorithm that iteratively solves depth-limited variants of the game, (2) we derive the bound on the depth guaranteeing an arbitrary precision, (3) we propose a novel upper-bound estimation that allows early terminations, and (4) we experimentally evaluate the algorithm on a pursuit-evasion game.

ICAPS Conference 2020 Conference Paper

Multiple-Environment Markov Decision Processes: Efficient Analysis and Applications

  • Krishnendu Chatterjee
  • Martin Chmelik
  • Deep Karkhanis
  • Petr Novotný 0001
  • Amélie Royer

Multiple-environment Markov decision processes (MEMDPs) are MDPs equipped with not one, but multiple probabilistic transition functions, which represent the various possible unknown environments. While the previous research on MEMDPs focused on theoretical properties for long-run average payoff, we study them with discounted-sum payoff and focus on their practical advantages and applications. MEMDPs can be viewed as a special case of Partially observable and Mixed observability MDPs: the state of the system is perfectly observable, but not the environment. We show that the specific structure of MEMDPs allows for more efficient algorithmic analysis, in particular for faster belief updates. We demonstrate the applicability of MEMDPs in several domains. In particular, we formalize the sequential decision-making approach to contextual recommendation systems as MEMDPs and substantially improve over the previous MDP approach.

AAAI Conference 2020 Conference Paper

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

  • Tomáš Brázdil
  • Krishnendu Chatterjee
  • Petr Novotný
  • Jiří Vahala

Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 106 states.

PRL Workshop 2020 Workshop Paper

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes (Extended Abstract)

  • Tomas Brazdil
  • Krishnendu Chatterjee
  • Petr Novotný
  • Jiří Vahala

Markov decision processes (MDPs) are the standard model of sequential decision making under stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff and with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 106 states. This extended abstract summarizes results presented in the paper Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes published at AAAI’20.

MFCS Conference 2020 Conference Paper

Simplified Game of Life: Algorithms and Complexity

  • Krishnendu Chatterjee
  • Rasmus Ibsen-Jensen
  • Ismaël Jecker
  • Jakub Svoboda

Game of Life is a simple and elegant model to study dynamical system over networks. The model consists of a graph where every vertex has one of two types, namely, dead or alive. A configuration is a mapping of the vertices to the types. An update rule describes how the type of a vertex is updated given the types of its neighbors. In every round, all vertices are updated synchronously, which leads to a configuration update. While in general, Game of Life allows a broad range of update rules, we focus on two simple families of update rules, namely, underpopulation and overpopulation, that model several interesting dynamics studied in the literature. In both settings, a dead vertex requires at least a desired number of live neighbors to become alive. For underpopulation (resp. , overpopulation), a live vertex requires at least (resp. at most) a desired number of live neighbors to remain alive. We study the basic computation problems, e. g. , configuration reachability, for these two families of rules. For underpopulation rules, we show that these problems can be solved in polynomial time, whereas for overpopulation rules they are PSPACE-complete.

ICAPS Conference 2018 Conference Paper

Algorithms and Conditional Lower Bounds for Planning Problems

  • Krishnendu Chatterjee
  • Wolfgang Dvorák
  • Monika Henzinger
  • Alexander Svozil

We consider planning problems for graphs, Markov decision processes (MDPs), and games on graphs. While graphs represent the most basic planning model, MDPs represent interaction with nature and games on graphs represent interaction with an adversarial environment. We consider two planning problems where there are k different target sets, and the problems are as follows: (a) the coverage problem asks whether there is a plan for each individual target set, and (b) the sequential target reachability problem asks whether the targets can be reached in sequence. For the coverage problem, we present a linear-time algorithm for graphs, and quadratic conditional lower bound for MDPs and games on graphs. For the sequential target problem, we present a linear-time algorithm for graphs, a sub-quadratic algorithm for MDPs, and a quadratic conditional lower bound for games on graphs. Our results with conditional lower bounds establish (i) model-separation results showing that for the coverage problem MDPs and games on graphs are harder than graphs and for the sequential reachability problem games on graphs are harder than MDPs and graphs; and (ii) objective-separation results showing that for MDPs the coverage problem is harder than the sequential target problem.

IJCAI Conference 2018 Conference Paper

Computational Approaches for Stochastic Shortest Path on Succinct MDPs

  • Krishnendu Chatterjee
  • Hongfei Fu
  • Amir Goharshady
  • Nastaran Okati

We consider the stochastic shortest path (SSP) problem for succinct Markov decision processes (MDPs), where the MDP consists of a set of variables, and a set of nondeterministic rules that update the variables. First, we show that several examples from the AI literature can be modeled as succinct MDPs. Then we present computational approaches for upper and lower bounds for the SSP problem: (a) for computing upper bounds, our method is polynomial-time in the implicit description of the MDP; (b) for lower bounds, we present a polynomial-time (in the size of the implicit description) reduction to quadratic programming. Our approach is applicable even to infinite-state MDPs. Finally, we present experimental results to demonstrate the effectiveness of our approach on several classical examples from the AI literature.

IJCAI Conference 2018 Conference Paper

Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-Sum Objectives

  • Krishnendu Chatterjee
  • Adrián Elgyütt
  • Petr Novotný
  • Owen Rouillé

Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows to obtain risk-averse policies, but ignore optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it.

IJCAI Conference 2018 Conference Paper

Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

  • Karel Horák
  • Branislav Bošanský
  • Krishnendu Chatterjee

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.

LPAR Conference 2018 Conference Paper

Quasipolynomial Set-Based Symbolic Algorithms for Parity Games

  • Krishnendu Chatterjee
  • Wolfgang Dvorák
  • Monika Henzinger
  • Alexander Svozil

Solving parity games, which are equivalent to modal μ-calculus model checking, is a central algorithmic problem in formal methods, with applications in reactive synthesis, program repair, verification of branching-time properties, etc. Besides the standard compu- tation model with the explicit representation of games, another important theoretical model of computation is that of set-based symbolic algorithms. Set-based symbolic algorithms use basic set operations and one-step predecessor operations on the implicit description of games, rather than the explicit representation. The significance of symbolic algorithms is that they provide scalable algorithms for large finite-state systems, as well as for infinite-state systems with finite quotient. Consider parity games on graphs with n vertices and parity conditions with d priorities. While there is a rich literature of explicit algorithms for parity games, the main results for set-based symbolic algorithms are as follows: (a) the basic algorithm that requires O(nd) symbolic operations and O(d) symbolic space; and (b) an improved algorithm that requires O(nd/3+1) symbolic operations and O(n) symbolic space. In this work, our contributions are as follows: (1) We present a black-box set-based symbolic algorithm based on the explicit progress measure algorithm. Two important consequences of our algorithm are as follows: (a) a set-based symbolic algorithm for parity games that requires quasi-polynomially many symbolic operations and O(n) symbolic space; and (b) any future improvement in progress measure based explicit algorithms immediately imply an efficiency improvement in our set-based symbolic algorithm for parity games. (2) We present a set-based symbolic algorithm that requires quasi-polynomially many symbolic operations and O(d · log n) symbolic space. Moreover, for the important special case of d ≤ log n, our algorithm requires only polynomially many symbolic operations and poly-logarithmic symbolic space.

ICAPS Conference 2018 Conference Paper

Sensor Synthesis for POMDPs with Reachability Objectives

  • Krishnendu Chatterjee
  • Martin Chmelik
  • Ufuk Topcu

Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize “weakest” additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability 1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies.

MFCS Conference 2017 Conference Paper

Faster Algorithms for Mean-Payoff Parity Games

  • Krishnendu Chatterjee
  • Monika Henzinger
  • Alexander Svozil

Graph games provide the foundation for modeling and synthesis of reactive processes. Such games are played over graphs where the vertices are controlled by two adversarial players. We consider graph games where the objective of the first player is the conjunction of a qualitative objective (specified as a parity condition) and a quantitative objective (specified as a mean-payoff condition). There are two variants of the problem, namely, the threshold problem where the quantitative goal is to ensure that the mean-payoff value is above a threshold, and the value problem where the quantitative goal is to ensure the optimal mean-payoff value; in both cases ensuring the qualitative parity objective. The previous best-known algorithms for game graphs with n vertices, m edges, parity objectives with d priorities, and maximal absolute reward value W for mean-payoff objectives, are as follows: O(n^(d+1)·m·W) for the threshold problem, and O(n^(d+2)·m·W) for the value problem. Our main contributions are faster algorithms, and the running times of our algorithms are as follows: O(n^(d-1)·m·W) for the threshold problem, and O(n^d·m·W·log(n·W)) for the value problem. For mean-payoff parity objectives with two priorities, our algorithms match the best-known bounds of the algorithms for mean-payoff games (without conjunction with parity objectives). Our results are relevant in synthesis of reactive systems with both functional requirement (given as a qualitative objective) and performance requirement (given as a quantitative objective).

MFCS Conference 2017 Conference Paper

Faster Monte-Carlo Algorithms for Fixation Probability of the Moran Process on Undirected Graphs

  • Krishnendu Chatterjee
  • Rasmus Ibsen-Jensen
  • Martin A. Nowak

Evolutionary graph theory studies the evolutionary dynamics in a population structure given as a connected graph. Each node of the graph represents an individual of the population, and edges determine how offspring are placed. We consider the classical birth-death Moran process where there are two types of individuals, namely, the residents with fitness 1 and mutants with fitness r. The fitness indicates the reproductive strength. The evolutionary dynamics happens as follows: in the initial step, in a population of all resident individuals a mutant is introduced, and then at each step, an individual is chosen proportional to the fitness of its type to reproduce, and the offspring replaces a neighbor uniformly at random. The process stops when all individuals are either residents or mutants. The probability that all individuals in the end are mutants is called the fixation probability, which is a key factor in the rate of evolution. We consider the problem of approximating the fixation probability. The class of algorithms that is extremely relevant for approximation of the fixation probabilities is the Monte-Carlo simulation of the process. Previous results present a polynomial-time Monte-Carlo algorithm for undirected graphs when $r$ is given in unary. First, we present a simple modification: instead of simulating each step, we discard ineffective steps, where no node changes type (i. e. , either residents replace residents, or mutants replace mutants). Using the above simple modification and our result that the number of effective steps is concentrated around the expected number of effective steps, we present faster polynomial-time Monte-Carlo algorithms for undirected graphs. Our algorithms are always at least a factor O(n^2/log n) faster as compared to the previous algorithms, where n is the number of nodes, and is polynomial even if r is given in binary. We also present lower bounds showing that the upper bound on the expected number of effective steps we present is asymptotically tight for undirected graphs.

CSL Conference 2017 Conference Paper

Improved Set-Based Symbolic Algorithms for Parity Games

  • Krishnendu Chatterjee
  • Wolfgang Dvorák
  • Monika Henzinger
  • Veronika Loitzenbauer

Graph games with omega-regular winning conditions provide a mathematical framework to analyze a wide range of problems in the analysis of reactive systems and programs (such as the synthesis of reactive systems, program repair, and the verification of branching time properties). Parity conditions are canonical forms to specify omega-regular winning conditions. Graph games with parity conditions are equivalent to mu-calculus model checking, and thus a very important algorithmic problem. Symbolic algorithms are of great significance because they provide scalable algorithms for the analysis of large finite-state systems, as well as algorithms for the analysis of infinite-state systems with finite quotient. A set-based symbolic algorithm uses the basic set operations and the one-step predecessor operators. We consider graph games with n vertices and parity conditions with c priorities (equivalently, a mu-calculus formula with c alternations of least and greatest fixed points). While many explicit algorithms exist for graph games with parity conditions, for set-based symbolic algorithms there are only two algorithms (notice that we use space to refer to the number of sets stored by a symbolic algorithm): (a) the basic algorithm that requires O(n^c) symbolic operations and linear space; and (b) an improved algorithm that requires O(n^{c/2+1}) symbolic operations but also O(n^{c/2+1}) space (i. e. , exponential space). In this work we present two set-based symbolic algorithms for parity games: (a) our first algorithm requires O(n^{c/2+1}) symbolic operations and only requires linear space; and (b) developing on our first algorithm, we present an algorithm that requires O(n^{c/3+1}) symbolic operations and only linear space. We also present the first linear space set-based symbolic algorithm for parity games that requires at most a sub-exponential number of symbolic operations.

AAAI Conference 2017 Conference Paper

Optimizing Expectation with Guarantees in POMDPs

  • Krishnendu Chatterjee
  • Petr Novotn_
  • Guillermo PŽrez
  • Jean-Franois Raskin
  • _or_e _ikeli_

A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the “expectation” and “threshold” approaches and consider a “guaranteed payoff optimization (GPO)” problem for POMDPs, where we are given a threshold t and the objective is to find a policy σ such that a) each possible outcome of σ yields a discounted-sum payoff of at least t, and b) the expected discounted-sum payoff of σ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks.

MFCS Conference 2017 Conference Paper

Strategy Complexity of Concurrent Safety Games

  • Krishnendu Chatterjee
  • Kristoffer Arnsfelt Hansen
  • Rasmus Ibsen-Jensen

We consider two player, zero-sum, finite-state concurrent reachability games, played for an infinite number of rounds, where in every round, each player simultaneously and independently of the other players chooses an action, whereafter the successor state is determined by a probability distribution given by the current state and the chosen actions. Player 1 wins iff a designated goal state is eventually visited. We are interested in the complexity of stationary strategies measured by their patience, which is defined as the inverse of the smallest non-zero probability employed. Our main results are as follows: We show that: (i) the optimal bound on the patience of optimal and epsilon-optimal strategies, for both players is doubly exponential; and (ii) even in games with a single non-absorbing state exponential (in the number of actions) patience is necessary.

AAAI Conference 2016 Conference Paper

A Symbolic SAT-Based Algorithm for Almost-Sure Reachability with Small Strategies in POMDPs

  • Krishnendu Chatterjee
  • Martin Chmelík
  • Jessica Davies

POMDPs are standard models for probabilistic planning problems, where an agent interacts with an uncertain environment. We study the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a policy to ensure that the target set is reached with probability 1 (almost-surely). While in general the problem is EXPTIMEcomplete, in many practical cases policies with a small amount of memory suffice. Moreover, the existing solution to the problem is explicit, which first requires to construct explicitly an exponential reduction to a belief-support MDP. In this work, we first study the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies. We present a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem. We report experimental results demonstrating the scalability of our symbolic (SAT-based) approach.

Highlights Conference 2016 Conference Abstract

Computation Tree Logic for Synchronization Properties

  • Krishnendu Chatterjee
  • Laurent Doyen

We present a logic that extends CTL (Computation Tree Logic) with operators that express synchronization properties. A property is synchronized in a system if it holds in all paths of a certain length. The new logic is obtained by using the same path quantifiers and temporal operators as in CTL, but allowing a different order of the quantifiers. This small syntactic variation induces a logic that can express non-regular properties for which known extensions of MSO with equality of path length are undecidable. The talk is based on a joint work with Krishnendu Chatterjee (accepted at ICALP 2016)

MFCS Conference 2016 Conference Paper

Conditionally Optimal Algorithms for Generalized Büchi Games

  • Krishnendu Chatterjee
  • Wolfgang Dvorák
  • Monika Henzinger
  • Veronika Loitzenbauer

Games on graphs provide the appropriate framework to study several central problems in computer science, such as verification and synthesis of reactive systems. One of the most basic objectives for games on graphs is the liveness (or Büchi) objective that given a target set of vertices requires that some vertex in the target set is visited infinitely often. We study generalized Büchi objectives (i. e. , conjunction of liveness objectives), and implications between two generalized Büchi objectives (known as GR(1) objectives), that arise in numerous applications in computer-aided verification. We present improved algorithms and conditional super-linear lower bounds based on widely believed assumptions about the complexity of (A1) combinatorial Boolean matrix multiplication and (A2) CNF-SAT. We consider graph games with n vertices, m edges, and generalized Büchi objectives with k conjunctions. First, we present an algorithm with running time O(k*n^2), improving the previously known O(k*n*m) and O(k^2*n^2) worst-case bounds. Our algorithm is optimal for dense graphs under (A1). Second, we show that the basic algorithm for the problem is optimal for sparse graphs when the target sets have constant size under (A2). Finally, we consider GR(1) objectives, with k_1 conjunctions in the antecedent and k_2 conjunctions in the consequent, and present an O(k_1 k_2 n^{2. 5})-time algorithm, improving the previously known O(k_1*k_2*n*m)-time algorithm for m > n^{1. 5}.

ICAPS Conference 2016 Conference Paper

Indefinite-Horizon Reachability in Goal-DEC-POMDPs

  • Krishnendu Chatterjee
  • Martin Chmelik

DEC-POMDPs extend POMDPs to a multi-agent setting, where several agents operate in an uncertain environment independently to achieve a joint objective. DEC-POMDPs have been studied with finite-horizon and infinite-horizon discounted-sum objectives, and there exist solvers both for exact and approximate solutions. In this work we consider Goal-DEC-POMDPs, where given a set of target states, the objective is to ensure that the target set is reached with minimal cost. We consider the indefinite-horizon (infinite-horizon with either discounted-sum, or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present a new and novel method to solve the problem that extends methods for finite-horizon DEC-POMDPs and the RTDP-Bel approach for POMDPs. We present experimental results on several examples, and show that our approach presents promising results.

MFCS Conference 2016 Conference Paper

Nested Weighted Limit-Average Automata of Bounded Width

  • Krishnendu Chatterjee
  • Thomas A. Henzinger
  • Jan Otop

While weighted automata provide a natural framework to express quantitative properties, many basic properties like average response time cannot be expressed with weighted automata. Nested weighted automata extend weighted automata and consist of a master automaton and a set of slave automata that are invoked by the master automaton. Nested weighted automata are strictly more expressive than weighted automata (e. g. , average response time can be expressed with nested weighted automata), but the basic decision questions have higher complexity (e. g. , for deterministic automata, the emptiness question for nested weighted automata is PSPACE-hard, whereas the corresponding complexity for weighted automata is PTIME). We consider a natural subclass of nested weighted automata where at any point at most a bounded number k of slave automata can be active. We focus on automata whose master value function is the limit average. We show that these nested weighted automata with bounded width are strictly more expressive than weighted automata (e. g. , average response time with no overlapping requests can be expressed with bound k=1, but not with non-nested weighted automata). We show that the complexity of the basic decision problems (i. e. , emptiness and universality) for the subclass with k constant matches the complexity for weighted automata. Moreover, when k is part of the input given in unary we establish PSPACE-completeness.

IJCAI Conference 2016 Conference Paper

Robust Draws in Balanced Knockout Tournaments

  • Krishnendu Chatterjee
  • Rasmus Ibsen-Jensen
  • Josef Tkadlec

Balanced knockout tournaments are ubiquitous in sports competitions and are also used in decision-making and elections. The traditional computational question, that asks to compute a draw (optimal draw) that maximizes the winning probability for a distinguished player, has received a lot of attention. Previous works consider the problem where the pairwise winning probabilities are known precisely, while we study how robust is the winning probability with respect to small errors in the pairwise winning probabilities. First, we present several illuminating examples to establish: (a) there exist deterministic tournaments (where the pairwise winning probabilities are 0 or 1) where one optimal draw is much more robust than the other; and (b) in general, there exist tournaments with slightly suboptimal draws that are more robust than all the optimal draws. The above examples motivate the study of the computational problem of robust draws that guarantee a specified winning probability. Second, we present a polynomial-time algorithm for approximating the robustness of a draw for sufficiently small errors in pairwise winning probabilities, and obtain that the stated computational problem is NP-complete. We also show that two natural cases of deterministic tournaments where the optimal draw could be computed in polynomial time also admit polynomial-time algorithms to compute robust optimal draws.

AAMAS Conference 2016 Conference Paper

Stochastic Shortest Path with Energy Constraints in Pomdps (Extended Abstract)

  • Tomáš Brázdil
  • Krishnendu Chatterjee
  • Martin Chmelík
  • Anchit Gupta
  • Petr Novotný

We extend the traditional framework of POMDPs to model resource consumption inducing a hard constraint on the behaviour of the model. Resource levels increase and decrease with transitions, and the hard constraint requires that the level remains positive in all steps. We present an algorithm for solving POMDPs with resource levels, developing on existing POMDP solvers. Our second contribution is related to policy representation. For larger POMDPs the policies computed by existing solvers are too large to be understandable, an issue particularly pronounced in POMDPs with resource levels. We present a procedure based on machine learning techniques that extracts important decisions of a policy and outputs its readable representation.

ECAI Conference 2016 Conference Paper

The Complexity of Deciding Legality of a Single Step of Magic: The Gathering

  • Krishnendu Chatterjee
  • Rasmus Ibsen-Jensen

Magic: the Gathering is a game about magical combat for any number of players. Formally it is a zero-sum, imperfect information stochastic game that consists of a potentially unbounded number of steps. We consider the problem of deciding if a move is legal in a given single step of Magic. We show that the problem is (a) coNP-complete in general; and (b) in P if either of two small sets of cards are not used. Our lower bound holds even for single-player Magic games. The significant aspects of our results are as follows: First, in most real-life game problems, the task of deciding whether a given move is legal in a single step is trivial, and the computationally hard task is to find the best sequence of legal moves in the presence of multiple players. In contrast, quite uniquely our hardness result holds for single step and with only one-player. Second, we establish efficient algorithms for important special cases of Magic.

AAAI Conference 2015 Conference Paper

Automatic Generation of Alternative Starting Positions for Simple Traditional Board Games

  • Umair Ahmed
  • Krishnendu Chatterjee
  • Sumit Gulwani

Simple board games, like Tic-Tac-Toe and CONNECT-4, play an important role not only in the development of mathematical and logical skills, but also in the emotional and social development. In this paper, we address the problem of generating targeted starting positions for such games. This can facilitate new approaches for bringing novice players to mastery, and also leads to discovery of interesting game variants. We present an approach that generates starting states of varying hardness levels for player 1 in a two-player board game, given rules of the board game, the desired number of steps required for player 1 to win, and the expertise levels of the two players. Our approach leverages symbolic methods and iterative simulation to efficiently search the extremely large state space. We present experimental results that include discovery of states of varying hardness levels for several simple gridbased board games. The presence of such states for standard game variants like 4 × 4 Tic-Tac-Toe opens up new games to be played that have never been played as the default start state is heavily biased.

AAAI Conference 2015 Conference Paper

Optimal Cost Almost-Sure Reachability in POMDPs

  • Krishnendu Chatterjee
  • Martin Chmelik
  • Raghav Gupta
  • Ayush Kanodia

We consider partially observable Markov decision processes (POMDPs) with a set of target states and every transition is associated with an integer cost. The optimization objective we study asks to minimize the expected total cost till the target set is reached, while ensuring that the target set is reached almost-surely (with probability 1). We show that for integer costs approximating the optimal cost is undecidable. For positive costs, our results are as follows: (i) we establish matching lower and upper bounds for the optimal cost and the bound is double exponential; (ii) we show that the problem of approximating the optimal cost is decidable and present approximation algorithms developing on the existing algorithms for POMDPs with finite-horizon objectives. While the worstcase running time of our algorithm is double exponential, we present efficient stopping criteria for the algorithm and show experimentally that it performs well in many examples.

ICRA Conference 2015 Conference Paper

Qualitative analysis of POMDPs with temporal logic specifications for robotics applications

  • Krishnendu Chatterjee
  • Martin Chmelik
  • Raghav Gupta
  • Ayush Kanodia

We consider partially observable Markov decision processes (POMDPs), that are a standard framework for robotics applications to model uncertainties present in the real world, with temporal logic specifications. All temporal logic specifications in linear-time temporal logic (LTL) can be expressed as parity objectives. We study the qualitative analysis problem for POMDPs with parity objectives that asks whether there is a controller (policy) to ensure that the objective holds with probability 1 (almost-surely). While the qualitative analysis of POMDPs with parity objectives is undecidable, recent results show that when restricted to finite-memory policies the problem is EXPTIME-complete. While the problem is intractable in theory, we present a practical approach to solve the qualitative analysis problem. We designed several heuristics to deal with the exponential complexity, and have used our implementation on a number of well-known POMDP examples for robotics applications. Our results provide the first practical approach to solve the qualitative analysis of robot motion planning with LTL properties in the presence of uncertainty.

Highlights Conference 2015 Conference Abstract

Strategy Complexity of Concurrent Stochastic Games with Safety and Reachability Objectives

  • Krishnendu Chatterjee

We consider finite-state concurrent stochastic games, played by k>=2 players for an infinite number of rounds, where in every round, each player simultaneously and independently of the other players chooses an action, whereafter the successor state is determined by a probability distribution given by the current state and the chosen actions. We consider reachability objectives that given a target set of states require that some state in the target is visited, and the dual safety objectives that given a target set require that only states in the target set are visited. We are interested in the complexity of stationary strategies measured by their patience, which is defined as the inverse of the smallest nonzero probability employed. Our main results are as follows: We show that in two-player zero-sum concurrent stochastic games (with reachability objective for one player and the complementary safety objective for the other player): (i) the optimal bound on the patience of optimal and epsilon-optimal strategies, for both players is doubly exponential; and (ii) even in games with a single nonabsorbing state exponential (in the number of actions) patience is necessary. In general we study the class of non-zero-sum games admitting stationary epsilon-Nash equilibria. We show that if there is at least one player with reachability objective, then doubly-exponential patience may be needed for epsilon-Nash equilibrium strategies, whereas in contrast if all players have safety objectives, the optimal bound on patience for epsilon-Nash equilibrium strategies is only exponential. This is a joint work with Kristoffer Arnsfelt Hansen and Rasmus Ibsen-Jensen. The full version of the manuscript is available in Arxiv. The link is as follows: http: //arxiv. org/abs/1506. 02434

SODA Conference 2015 Conference Paper

The Value 1 Problem Under Finite-memory Strategies for Concurrent Mean-payoff Games

  • Krishnendu Chatterjee
  • Rasmus Ibsen-Jensen

We consider concurrent mean-payoff games, a very well-studied class of two-player (player 1 vs player 2) zero-sum games on finite-state graphs where every transition is assigned a reward between 0 and 1, and the payoff function is the long-run average of the rewards. The value is the maximal expected payoff that player 1 can guarantee against all strategies of player 2. We consider the computation of the set of states with value 1 under finite-memory strategies for player 1, and our main results for the problem are as follows: (1) we present a polynomial-time algorithm; (2) we show that whenever there is a finite-memory strategy, there is a stationary strategy that does not need memory at all; and (3) we present an optimal bound (which is double exponential) on the patience of stationary strategies (where patience of a distribution is the inverse of the smallest positive probability and represents a complexity measure of a stationary strategy).

GandALF Workshop 2013 Workshop Paper

Approximating the minimum cycle mean

  • Krishnendu Chatterjee
  • Monika Henzinger
  • Sebastian Krinninger
  • Veronika Loitzenbauer

We consider directed graphs where each edge is labeled with an integer weight and study the fundamental algorithmic question of computing the value of a cycle with minimum mean weight. Our contributions are twofold: (1) First we show that the algorithmic question is reducible in O(n^2) time to the problem of a logarithmic number of min-plus matrix multiplications of n-by-n matrices, where n is the number of vertices of the graph. (2) Second, when the weights are nonnegative, we present the first (1 + ε)-approximation algorithm for the problem and the running time of our algorithm is \tilde(O)(n^ω log^3(nW/ε) / ε), where O(n^ω) is the time required for the classic n-by-n matrix multiplication and W is the maximum value of the weights.

Highlights Conference 2013 Conference Abstract

Boundedness games

  • Nathanaël Fijalkow
  • Krishnendu Chatterjee
  • Thomas Colcombet
  • Florian Horn
  • Denis Kuperberg
  • Michał Skrzypczak
  • Martin Zimmermann

In this talk, I will discuss some recent developments about boundedness games. They are two-player games played over graphs, featuring counters. The first player's objective is to satisfy a parity condition and that the counter values are bounded. A lot of questions are open: what is the algorithmic complexity of deciding the winner, and does there always exist finite-memory winning strategies? In particular, as shown by Thomas Colcombet, proving the existence of finite-memory winning strategies for (some) boundedness games is the last missing item to prove the decidability of cost-MSO over infinite trees. I will give an overview of where we stand now and what are the next challenges. 13: 00 14: 30 Lunch

CSL Conference 2013 Conference Paper

Infinite-state games with finitary conditions

  • Krishnendu Chatterjee
  • Nathanaël Fijalkow

We study two-player zero-sum games over infinite-state graphs equipped with omega-B and finitary conditions. Our first contribution is about the strategy complexity, i. e the memory required for winning strategies: we prove that over general infinite-state graphs, memoryless strategies are sufficient for finitary Büchi, and finite-memory suffices for finitary parity games. We then study pushdown games with boundedness conditions, with two contributions. First we prove a collapse result for pushdown games with omega-B-conditions, implying the decidability of solving these games. Second we consider pushdown games with finitary parity along with stack boundedness conditions, and show that solving these games is EXPTIME-complete.

Highlights Conference 2013 Conference Abstract

Looking at mean-payoff and total-payoff through windows

  • Krishnendu Chatterjee
  • Laurent Doyen
  • Mickael Randour
  • Jean-François Raskin

We consider two-player games with mean-payoff (MP) and total-payoff (TP) objectives. In single dimension, their complexities coincide. In multi dimensions, MP games are coNP-complete. We show that TP games are undecidable. We introduce conservative approximations, considering the payoff over a local finite window sliding along a play. For single dimension, we show that (i) if the window is polynomial, deciding the winner is in P, and (ii) the existence of a bounded window can be decided in NP $\cap$ coNP, and is at least as hard as solving MP games. For multi dimensions, we show that (i) the fixed window size problem is EXP-complete, and (ii) there is no primitive-recursive algorithm to decide the existence of a bounded window.

LPAR Conference 2013 Conference Paper

Multi-objective Discounted Reward Verification in Graphs and MDPs

  • Krishnendu Chatterjee
  • Vojtech Forejt
  • Dominik Wojtczak

Abstract We study the problem of achieving a given value in Markov decision processes (MDPs) with several independent discounted reward objectives. We consider a generalised version of discounted reward objectives, in which the amount of discounting depends on the states visited and on the objective. This definition extends the usual definition of discounted reward, and allows to capture the systems in which the value of different commodities diminish at different and variable rates. We establish results for two prominent subclasses of the problem, namely state-discount models where the discount factors are only dependent on the state of the MDP (and independent of the objective), and reward-discount models where they are only dependent on the objective (but not on the state of the MDP). For the state-discount models we use a straightforward reduction to expected total reward and show that the problem whether a value is achievable can be solved in polynomial time. For the reward-discount model we show that memory and randomisation of the strategies are required, but nevertheless that the problem is decidable and it is sufficient to consider strategies which after a certain number of steps behave in a memoryless way. For the general case, we show that when restricted to graphs (i. e. MDPs with no randomisation), pure strategies and discount factors of the form 1/ n where n is an integer, the problem is in PSPACE and finite memory suffices for achieving a given value. We also show that when the discount factors are not of the form 1/ n, the memory required by a strategy can be infinite.

UAI Conference 2013 Conference Paper

POMDPs under Probabilistic Semantics

  • Krishnendu Chatterjee
  • Martin Chmelik

We consider partially observable Markov decision processes (POMDPs) with limitaverage payoff, where a reward value in the interval [0, 1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold λ1 ∈ (0, 1]; and (ii) qualitative constraint which is a special case of quantitative constraint with λ1 = 1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraint are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative path constraint we show that the problem of deciding the existence of a finite-memory controller is undecidable.

Highlights Conference 2013 Conference Abstract

Survey of recent results for stochastic games

  • Krishnendu Chatterjee

Stochastic games provide a powerful framework to model fundamental problems in the design and analysis of stochastic reactive systems. While this connection is well understood, and several results have enriched the framework, there is a wealth of research questions that are open, including some long-standing ones (such as the complexity of perfect-information stochastic reachability games). In this talk, I will survey some recent results for general model of stochastic games and related models. The talk is based on several papers, and presents an overview of recent results based on joint works with several co-authors, as well as results from other researchers.

Highlights Conference 2013 Conference Abstract

Trading performance for stability in Markov decision processes

  • Tomáš Brázdil
  • Antonin Kučera
  • Vojtĕch Forejt
  • Krishnendu Chatterjee

We study the central controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize both the expected mean-payoff performance of the system and its stability. We argue that the basic theoretical notion of expressing the stability in terms of the variance of the mean-payoff (called global variance in our paper) is not always sufficient, since it ignores possible instabilities on respective runs. For this reason we propose alernative definitions of stability, which we call local and hybrid variance, and which express how rewards on each run deviate from the run's own mean-payoff and from the expected mean-payoff, respectively. We show that a strategy ensuring both the expected mean-payoff and the variance below given bounds requires randomization and memory, under all the above semantics of variance. We then look at complexity of the problem of determining whether there is a such a strategy.

CSL Conference 2013 Conference Paper

What is Decidable about Partially Observable Markov Decision Processes with omega-Regular Objectives

  • Krishnendu Chatterjee
  • Martin Chmelik
  • Mathieu Tracol

We consider partially observable Markov decision processes (POMDPs) with omega-regular conditions specified as parity objectives. The qualitative analysis problem given a POMDP and a parity objective asks whether there is a strategy to ensure that the objective is satisfied with probability 1 (resp. positive probability). While the qualitative analysis problems are known to be undecidable even for very special cases of parity objectives, we establish decidability (with optimal EXPTIME-complete complexity) of the qualitative analysis problems for POMDPs with all parity objectives under finite-memory strategies. We also establish optimal (exponential) memory bounds.

CSL Conference 2012 Conference Paper

Faster Algorithms for Alternating Refinement Relations

  • Krishnendu Chatterjee
  • Siddhesh Chaubal
  • Pritish Kamath

One central issue in the formal design and analysis of reactive systems is the notion of refinement that asks whether all behaviors of the implementation is allowed by the specification. The local interpretation of behavior leads to the notion of simulation. Alternating transition systems (ATSs) provide a general model for composite reactive systems, and the simulation relation for ATSs is known as alternating simulation. The simulation relation for fair transition systems is called fair simulation. In this work our main contributions are as follows: (1) We present an improved algorithm for fair simulation with Büchi fairness constraints; our algorithm requires O(n^3 * m) time as compared to the previous known O(n^6)-time algorithm, where n is the number of states and m is the number of transitions. (2) We present a game based algorithm for alternating simulation that requires O(m^2)-time as compared to the previous known O((n*m)^2)-time algorithm, where n is the number of states and m is the size of transition relation. (3) We present an iterative algorithm for alternating simulation that matches the time complexity of the game based algorithm, but is more space efficient than the game based algorithm.

GandALF Workshop 2011 Workshop Paper

A reduction from parity games to simple stochastic games

  • Krishnendu Chatterjee
  • Nathanaël Fijalkow

Games on graphs provide a natural model for reactive non-terminating systems. In such games, the interaction of two players on an arena results in an infinite path that describes a run of the system. Different settings are used to model various open systems in computer science, as for instance turn-based or concurrent moves, and deterministic or stochastic transitions. In this paper, we are interested in turn-based games, and specifically in deterministic parity games and stochastic reachability games (also known as simple stochastic games). We present a simple, direct and efficient reduction from deterministic parity games to simple stochastic games: it yields an arena whose size is linear up to a logarithmic factor in size of the original arena.

MFCS Conference 2011 Conference Paper

Energy and Mean-Payoff Parity Markov Decision Processes

  • Krishnendu Chatterjee
  • Laurent Doyen 0001

Abstract We consider Markov Decision Processes (MDPs) with mean-payoff parity and energy parity objectives. In system design, the parity objective is used to encode ω -regular specifications, while the mean-payoff and energy objectives can be used to model quantitative resource constraints. The energy condition requires that the resource level never drops below 0, and the mean-payoff condition requires that the limit-average value of the resource consumption is within a threshold. While these two (energy and mean-payoff) classical conditions are equivalent for two-player games, we show that they differ for MDPs. We show that the problem of deciding whether a state is almost-sure winning (i. e. , winning with probability 1) in energy parity MDPs is in NP ∩ coNP, while for mean-payoff parity MDPs, the problem is solvable in polynomial time.

GandALF Workshop 2010 Workshop Paper

Discounting in Games across Time Scales

  • Krishnendu Chatterjee
  • Rupak Majumdar

We introduce two-level discounted games played by two players on a perfect-information stochastic game graph. The upper level game is a discounted game and the lower level game is an undiscounted reachability game. Two-level games model hierarchical and sequential decision making under uncertainty across different time scales. We show the existence of pure memoryless optimal strategies for both players and an ordered field property for such games. We show that if there is only one player (Markov decision processes), then the values can be computed in polynomial time. It follows that whether the value of a player is equal to a given rational constant in two-level discounted games can be decided in NP intersected coNP. We also give an alternate strategy improvement algorithm to compute the value.

MFCS Conference 2010 Conference Paper

Qualitative Analysis of Partially-Observable Markov Decision Processes

  • Krishnendu Chatterjee
  • Laurent Doyen 0001
  • Thomas A. Henzinger

Abstract We study observation-based strategies for partially-observable Markov decision processes ( POMDP s) with parity objectives. An observation-based strategy relies on partial information about the history of a play, namely, on the past sequence of observations. We consider qualitative analysis problems: given a POMDP with a parity objective, decide whether there exists an observation-based strategy to achieve the objective with probability 1 (almost-sure winning), or with positive probability (positive winning). Our main results are twofold. First, we present a complete picture of the computational complexity of the qualitative analysis problem for POMDP s with parity objectives and its subclasses: safety, reachability, Büchi, and coBüchi objectives. We establish several upper and lower bounds that were not known in the literature. Second, we give optimal bounds (matching upper and lower bounds) for the memory required by pure and randomized observation-based strategies for each class of objectives.

MFCS Conference 2010 Conference Paper

Randomness for Free

  • Krishnendu Chatterjee
  • Laurent Doyen 0001
  • Hugo Gimbert
  • Thomas A. Henzinger

Abstract We consider two-player zero-sum games on graphs. These games can be classified on the basis of the information of the players and on the mode of interaction between them. On the basis of information the classification is as follows: (a) partial-observation (both players have partial view of the game); (b) one-sided complete-observation (one player has complete observation); and (c) complete-observation (both players have complete view of the game). On the basis of mode of interaction we have the following classification: (a) concurrent (players interact simultaneously); and (b) turn-based (players interact in turn). The two sources of randomness in these games are randomness in transition function and randomness in strategies. In general, randomized strategies are more powerful than deterministic strategies, and randomness in transitions gives more general classes of games. We present a complete characterization for the classes of games where randomness is not helpful in: (a) the transition function (probabilistic transition can be simulated by deterministic transition); and (b) strategies (pure strategies are as powerful as randomized strategies). As consequence of our characterization we obtain new undecidability results for these games.

MFCS Conference 2009 Invited Paper

Stochastic Games with Finitary Objectives

  • Krishnendu Chatterjee
  • Thomas A. Henzinger
  • Florian Horn 0001

Abstract The synthesis of a reactive system with respect to an ω -regular specification requires the solution of a graph game. Such games have been extended in two natural ways. First, a game graph can be equipped with probabilistic choices between alternative transitions, thus allowing the modeling of uncertain behavior. These are called stochastic games. Second, a liveness specification can be strengthened to require satisfaction within an unknown but bounded amount of time. These are called finitary objectives. We study, for the first time, the combination of stochastic games and finitary objectives. We characterize the requirements on optimal strategies and provide algorithms for computing the maximal achievable probability of winning stochastic games with finitary parity or Streett objectives. Most notably, the set of states from which a player can win with probability 1 for a finitary parity objective can be computed in polynomial time, even though no polynomial-time algorithm is known in the nonfinitary case.

SODA Conference 2009 Conference Paper

Termination criteria for solving concurrent safety and reachability games

  • Krishnendu Chatterjee
  • Luca de Alfaro
  • Thomas A. Henzinger

We consider concurrent games played on graphs. At every round of a game, each player simultaneously and independently selects a move; the moves jointly determine the transition to a successor state. Two basic objectives are the safety objective to stay forever in a given set of states, and its dual, the reachability objective to reach a given set of states. We present in this paper a strategy improvement algorithm for computing the value of a concurrent safety game, that is, the maximal probability with which player 1 can enforce the safety objective. The algorithm yields a sequence of player-1 strategies which ensure probabilities of winning that converge monotonically to the value of the safety game. Our result is significant because the strategy improvement algorithm provides, for the first time, a way to approximate the value of a concurrent safety game from below. Since a value iteration algorithm, or a strategy improvement algorithm for reachability games, can be used to approximate the same value from above, the combination of both algorithms yields a method for computing a converging sequence of upper and lower bounds for the values of concurrent reachability and safety games. Previous methods could approximate the values of these games only from one direction, and as no rates of convergence are known, they did not provide a practical way to solve these games.

CSL Conference 2008 Conference Paper

Quantitative Languages

  • Krishnendu Chatterjee
  • Laurent Doyen 0001
  • Thomas A. Henzinger

Abstract Quantitative generalizations of classical languages, which assign to each word a real number instead of a boolean value, have applications in modeling resource-constrained computation. We use weighted automata (finite automata with transition weights) to define several natural classes of quantitative languages over finite and infinite words; in particular, the real value of an infinite run is computed as the maximum, limsup, liminf, limit average, or discounted sum of the transition weights. We define the classical decision problems of automata theory (emptiness, universality, language inclusion, and language equivalence) in the quantitative setting and study their computational complexity. As the decidability of language inclusion remains open for some classes of weighted automata, we introduce a notion of quantitative simulation that is decidable and implies language inclusion. We also give a complete characterization of the expressive power of the various classes of weighted automata. In particular, we show that most classes of weighted automata cannot be determinized.

CSL Conference 2006 Conference Paper

Algorithms for Omega-Regular Games with Imperfect Information,

  • Krishnendu Chatterjee
  • Laurent Doyen 0001
  • Thomas A. Henzinger
  • Jean-François Raskin

Abstract We study observation-based strategies for two-player turn-based games on graphs with omega-regular objectives. An observation-based strategy relies on imperfect information about the history of a play, namely, on the past sequence of observations. Such games occur in the synthesis of a controller that does not see the private state of the plant. Our main results are twofold. First, we give a fixed-point algorithm for computing the set of states from which a player can win with a deterministic observation-based strategy for any omega-regular objective. The fixed point is computed in the lattice of antichains of state sets. This algorithm has the advantages of being directed by the objective and of avoiding an explicit subset construction on the game graph. Second, we give an algorithm for computing the set of states from which a player can win with probability 1 with a randomized observation-based strategy for a Büchi objective. This set is of interest because in the absence of perfect information, randomized strategies are more powerful than deterministic ones. We show that our algorithms are optimal by proving matching lower bounds.

CSL Conference 2006 Conference Paper

Concurrent Games with Tail Objectives

  • Krishnendu Chatterjee

Abstract We study infinite stochastic games played by two-players over a finite state space, with objectives specified by sets of infinite traces. The games are concurrent (players make moves simultaneously and independently), stochastic (the next state is determined by a probability distribution that depends on the current state and chosen moves of the players) and infinite (proceeds for infinite number of rounds). The analysis of concurrent stochastic games can be classified into: quantitative analysis, analyzing the optimum value of the game; and qualitative analysis, analyzing the set of states with optimum value 1. We consider concurrent games with tail objectives, i. e. , objectives that are independent of the finite-prefix of traces, and show that the class of tail objectives are strictly richer than the ω -regular objectives. We develop new proof techniques to extend several properties of concurrent games with ω -regular objectives to concurrent games with tail objectives. We prove the positive limit-one property for tail objectives, that states for all concurrent games if the optimum value for a player is positive for a tail objective Φ at some state, then there is a state where the optimum value is 1 for Φ, for the player. We also show that the optimum values of zero-sum (strictly conflicting objectives) games with tail objectives can be related to equilibrium values of nonzero-sum (not strictly conflicting objectives) games with simpler reachability objectives. A consequence of our analysis presents a polynomial time reduction of the quantitative analysis of tail objectives to the qualitative analysis for the sub-class of one-player stochastic games (Markov decision processes).

CSL Conference 2006 Conference Paper

Nash Equilibrium for Upward-Closed Objectives

  • Krishnendu Chatterjee

Abstract We study infinite stochastic games played by n -players on a finite graph with goals specified by sets of infinite traces. The games are concurrent (each player simultaneously and independently chooses an action at each round), stochastic (the next state is determined by a probability distribution depending on the current state and the chosen actions), infinite (the game continues for an infinite number of rounds), nonzero-sum (the players’ goals are not necessarily conflicting), and undiscounted. We show that if each player has an upward-closed objective, then there exists an ε -Nash equilibrium in memoryless strategies, for every ε >0; and exact Nash equilibria need not exist. Upward-closure of an objective means that if a set Z of infinitely repeating states is winning, then all supersets of Z of infinitely repeating states are also winning. Memoryless strategies are strategies that are independent of history of plays and depend only on the current state. We also study the complexity of finding values (payoff profile) of an ε -Nash equilibrium. We show that the values of an ε -Nash equilibrium in nonzero-sum concurrent games with upward-closed objectives for all players can be computed by computing ε -Nash equilibrium values of nonzero-sum concurrent games with reachability objectives for all players and a polynomial procedure. As a consequence we establish that values of an ε -Nash equilibrium can be computed in TFNP (total functional NP), and hence in EXPTIME.

UAI Conference 2005 Conference Paper

Counterexample-guided Planning

  • Krishnendu Chatterjee
  • Thomas A. Henzinger
  • Ranjit Jhala
  • Rupak Majumdar

Planning in adversarial and uncertain environments can be modeled as the problem of devising strategies in stochastic perfect information games. These games are generalizations of Markov decision processes (MDPs): there are two (adversarial) players, and a source of randomness. The main practical obstacle to computing winning strategies in such games is the size of the state space. In practice therefore, one typically works with abstractions of the model. The diffculty is to come up with an abstraction that is neither too coarse to remove all winning strategies (plans), nor too fine to be intractable. In verification, the paradigm of counterexample-guided abstraction refinement has been successful to construct useful but parsimonious abstractions automatically. We extend this paradigm to probabilistic models (namely, perfect information games and, as a special case, MDPs). This allows us to apply the counterexample-guided abstraction paradigm to the AI planning problem. As special cases, we get planning algorithms for MDPs and deterministic systems that automatically construct system abstractions.

CSL Conference 2004 Conference Paper

On Nash Equilibria in Stochastic Games

  • Krishnendu Chatterjee
  • Rupak Majumdar
  • Marcin Jurdzinski

Abstract We study infinite stochastic games played by n -players on a finite graph with goals given by sets of infinite traces. The games are stochastic (each player simultaneously and independently chooses an action at each round, and the next state is determined by a probability distribution depending on the current state and the chosen actions), infinite (the game continues for an infinite number of rounds), nonzero sum (the players’ goals are not necessarily conflicting), and undiscounted. We show that if each player has a reachability objective, that is, if the goal for each player i is to visit some subset R i of the states, then there exists an ε -Nash equilibrium in memoryless strategies, for every ε >0. However, exact Nash equilibria need not exist. We study the complexity of finding such Nash equilibria, and show that the payoff of some ε -Nash equilibrium in memoryless strategies can be ε -approximated in NP. We study the important subclass of n -player turn-based probabilistic games, where at each state at most one player has a nontrivial choice of moves. For turn-based probabilistic games, we show the existence of ε -Nash equilibria in pure strategies for games where the objective of player i is a Borel set B i of infinite traces. However, exact Nash equilibria may not exist. For the special case of ω -regular objectives, we show exact Nash equilibria exist, and can be computed in NP when the ω -regular objectives are expressed as parity objectives.

CSL Conference 2003 Conference Paper

Simple Stochastic Parity Games

  • Krishnendu Chatterjee
  • Marcin Jurdzinski
  • Thomas A. Henzinger

Abstract Many verification, planning, and control problems can be modeled as games played on state-transition graphs by one or two players whose conflicting goals are to form a path in the graph. The focus here is on simple stochastic parity games, that is, two-player games with turn-based probabilistic transitions and ω -regular objectives formalized as parity (Rabin chain) winning conditions. An efficient translation from simple stochastic parity games to nonstochastic parity games is given. As many algorithms are known for solving the latter, the translation yields efficient algorithms for computing the states of a simple stochastic parity game from which a player can win with probability 1. An important special case of simple stochastic parity games are the Markov decision processes with Büchi objectives. For this special case a first provably subquadratic algorithm is given for computing the states from which the single player has a strategy to achieve a Büchi objective with probability 1. For game graphs with m edges the algorithm works in time \(O(m \sqrt{m})\). Interestingly, a similar technique sheds light on the question of the computational complexity of solving simple Büchi games and yields the first provably subquadratic algorithm, with a running time of O ( n 2 / log n ) for game graphs with n vertices and O ( n ) edges.