Author name cluster

Đorđe Žikelić

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

1 author row

IJCAI Conference 2025 Conference Paper

LP-Based Weighted Model Integration over Non-Linear Real Arithmetic

S. Akshay
Supratik Chakraborty
Soroush Farokhnia
Amir Goharshady
Harshit Jitendra Motwani
Đorđe Žikelić

Weighted model integration (WMI) is a relatively recent formalism that has received significant interest as a technique for solving probabilistic inference tasks with complicated weight functions. Existing methods and tools are mostly focused on linear and polynomial functions and provide limited support for WMI of rational or radical functions, which naturally arise in several applications. In this work, we present a novel method for approximate WMI, which provides more effective support for the wide class of semi-algebraic functions that includes rational and radical functions, with literals defined over non-linear real arithmetic. Our algorithm leverages Farkas’ lemma and Handelman's theorem from real algebraic geometry to reduce WMI to solving a number of linear programming (LP) instances. The algorithm provides formal guarantees on the error bound of the obtained approximation and can reduce it to any user-defined value epsilon. Furthermore, our approach is perfectly parallelizable. Finally, we present extensive experimental results, demonstrating the superior performance of our method on a range of WMI tasks for rational and radical functions when compared to state-of-the-art tools for WMI, in terms of both applicability and tightness.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Neural Control and Certificate Repair via Runtime Monitoring

Emily Yu
Đorđe Žikelić
Thomas A. Henzinger

Learning-based methods provide a promising approach to solving highly non-linear control tasks that are often challenging for classical control methods. To ensure the satisfaction of a safety property, learning-based methods jointly learn a control policy together with a certificate function for the property. Popular examples include barrier functions for safety and Lyapunov functions for asymptotic stability. While there has been significant progress on learning-based control with certificate functions in the white-box setting, where the correctness of the certificate function can be formally verified, there has been little work on ensuring their reliability in the black-box setting where the system dynamics are unknown. In this work, we consider the problems of certifying and repairing neural network control policies and certificate functions in the black-box setting. We propose a novel framework that utilizes runtime monitoring to detect system behaviors that violate the property of interest under some initially trained neural network policy and certificate. These violating behaviors are used to extract new training data, that is used to re-train the neural network policy and the certificate function and to ultimately repair them. We demonstrate the effectiveness of our approach empirically by using it to repair and to boost the safety rate of neural network policies learned by a state-of-the-art method for learning-based control on two autonomous system control tasks.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Quantified Linear and Polynomial Arithmetic Satisfiability via Template-based Skolemization

Krishnendu Chatterjee
Ehsan Kafshdar Goharshady
Mehrdad Karrabi
Harshit J Motwani
Maximilian Seeliger
Đorđe Žikelić

The problem of checking satisfiability of linear real arithmetic (LRA) and non-linear real arithmetic (NRA) formulas has broad applications, in particular, they are at the heart of logic-related applications such as logic for artificial intelligence, program analysis, etc. While there has been much work on checking satisfiability of unquantified LRA and NRA formulas, the problem of checking satisfiability of quantified LRA and NRA formulas remains a significant challenge. The main bottleneck in the existing methods is a computationally expensive quantifier elimination step. In this work, we propose a novel method for efficient quantifier elimination in quantified LRA and NRA formulas. We propose a template-based Skolemization approach, where we automatically synthesize linear/polynomial Skolem functions in order to eliminate quantifiers in the formula. The key technical ingredient in our approach are Positivstellensätze theorems from algebraic geometry, which allow for an efficient manipulation of polynomial inequalities. Our method offers a range of appealing theoretical properties combined with a strong practical performance. On the theory side, our method is sound, semi-complete, and runs in subexponential time and polynomial space, as opposed to existing sound and complete quantifier elimination methods that run in doubly-exponential time and at least exponential space. On the practical side, our experiments show superior performance compared to state of the art SMT solvers in terms of the number of solved instances and runtime, both on LRA and on NRA benchmarks.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Certified Policy Verification and Synthesis for MDPs under Distributional Reach-Avoidance Properties

S. Akshay
Krishnendu Chatterjee
Tobias Meggendorfer
Đorđe Žikelić

Markov Decision Processes (MDPs) are a classical model for decision making in the presence of uncertainty. Often they are viewed as state transformers with planning objectives defined with respect to paths over MDP states. An increasingly popular alternative is to view them as distribution transformers, giving rise to a sequence of probability distributions over MDP states. For instance, reachability and safety properties in modeling robot swarms or chemical reaction networks are naturally defined in terms of probability distributions over states. Verifying such distributional properties is known to be hard and often beyond the reach of classical state-based verification techniques. In this work, we consider the problems of certified policy (i. e. controller) verification and synthesis in MDPs under distributional reach-avoidance specifications. By certified we mean that, along with a policy, we also aim to synthesize a (checkable) certificate ensuring that the MDP indeed satisfies the property. Thus, given the target set of distributions and an unsafe set of distributions over MDP states, our goal is to either synthesize a certificate for a given policy or synthesize a policy along with a certificate, proving that the target distribution can be reached while avoiding unsafe distributions. To solve this problem, we introduce the novel notion of distributional reach-avoid certificates and present automated procedures for (1) synthesizing a certificate for a given policy, and (2) synthesizing a policy together with the certificate, both providing formal guarantees on certificate correctness. Our experimental evaluation demonstrates the ability of our method to solve several non-trivial examples, including a multi-agent robot-swarm model, to synthesize certified policies and to certify existing policies.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Solving Long-run Average Reward Robust MDPs via Stochastic Games

Krishnendu Chatterjee
Ehsan Kafshdar Goharshady
Mehrdad Karrabi
Petr Novotný
Đorđe Žikelić

Markov decision processes (MDPs) provide a standard framework for sequential decision making under uncertainty. However, MDPs do not take uncertainty in transition probabilities into account. Robust Markov decision processes (RMDPs) address this shortcoming of MDPs by assigning to each transition an uncertainty set rather than a single probability value. In this work, we consider polytopic RMDPs in which all uncertainty sets are polytopes and study the problem of solving long-run average reward polytopic RMDPs. We present a novel perspective on this problem and show that it can be reduced to solving long-run average reward turn-based stochastic games with finite state and action spaces. This reduction allows us to derive several important consequences that were hitherto not known to hold for polytopic RMDPs. First, we derive new computational complexity bounds for solving long-run average reward polytopic RMDPs, showing for the first time that the threshold decision problem for them is in NP and coNP and that they admit a randomized algorithm with sub-exponential expected runtime. Second, we present Robust Polytopic Policy Iteration (RPPI), a novel policy iteration algorithm for solving long-run average reward polytopic RMDPs. Our experimental evaluation shows that RPPI is much more efficient in solving long-run average reward polytopic RMDPs compared to state-of-the-art methods based on value iteration.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Bidding Graph Games with Partially-Observable Budgets

Guy Avni
Ismael Jecker
Đorđe Žikelić

Two-player zero-sum "graph games" are central in logic, verification, and multi-agent systems. The game proceeds by placing a token on a vertex of a graph, and allowing the players to move it to produce an infinite path, which determines the winner or payoff of the game. Traditionally, the players alternate turns in moving the token. In "bidding games", however, the players have budgets and in each turn, an auction (bidding) determines which player moves the token. So far, bidding games have only been studied as full-information games. In this work we initiate the study of partial-information bidding games: we study bidding games in which a player's initial budget is drawn from a known probability distribution. We show that while for some bidding mechanisms and objectives, it is straightforward to adapt the results from the full-information setting to the partial-information setting, for others, the analysis is significantly more challenging, requires new techniques, and gives rise to interesting results. Specifically, we study games with "mean-payoff" objectives in combination with "poorman" bidding. We construct optimal strategies for a partially-informed player who plays against a fully-informed adversary. We show that, somewhat surprisingly, the "value" under pure strategies does not necessarily exist in such games.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

Đorđe Žikelić
Mathias Lechner
Abhinav Verma
Krishnendu Chatterjee
Thomas Henzinger

Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph’s sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.

PDF Details

AAAI Conference 2023 Conference Paper

Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees

Đorđe Žikelić
Mathias Lechner
Thomas A. Henzinger
Krishnendu Chatterjee

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p in [0,1] over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks.

PDF Details DOI

Highlights Conference 2023 Conference Abstract

MDPs as Distribution Transformers: Affine Invariant Synthesis for Safety Objectives

Đorđe Žikelić

Markov decision processes can be viewed as transformers of probability distributions. While this view is useful from a practical standpointto reason about trajectories of distributions, basic reachability and safetyproblems are known to be computationally intractable (i. e. , Skolem-hard)to solve in such models. Further, we show that even for simple examples ofMDPs, strategies for safety objectives over distributions can require infinitememory and randomization. In light of this, we present a novel overapproximation approach to synthesizestrategies in an MDP, such that a safety objective over the distributionsis met. More precisely, we develop a new framework for template-basedsynthesis of certificates as affine distributional and inductive invariants forsafety objectives in MDPs. We provide two algorithms within this framework. One can only synthesize memoryless strategies, but has relative completenessguarantees, while the other can synthesize general strategies. The runtimecomplexity of both algorithms is in PSPACE. We implement these algorithmsand show that they can solve several non-trivial examples. This is joint work with S. Akshay, Krishnendu Chatterjee and Tobias Meggendorfer that will be presented at CAV 2023. Contributed talk given by Đorđe Žikelić

Details

AAAI Conference 2023 Conference Paper

Quantization-Aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks

Mathias Lechner
Đorđe Žikelić
Krishnendu Chatterjee
Thomas A. Henzinger
Daniela Rus

We study the problem of training and certifying adversarially robust quantized neural networks (QNNs). Quantization is a technique for making neural networks more efficient by running them using low-bit integer arithmetic and is therefore commonly adopted in industry. Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization, and certification of the quantized representation is necessary to guarantee robustness. In this work, we present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs. Inspired by advances in robust learning of non-quantized networks, our training algorithm computes the gradient of an abstract representation of the actual network. Unlike existing approaches, our method can handle the discrete semantics of QNNs. Based on QA-IBP, we also develop a complete verification procedure for verifying the adversarial robustness of QNNs, which is guaranteed to terminate and produce a correct answer. Compared to existing approaches, the key advantage of our verification procedure is that it runs entirely on GPU or other accelerator devices. We demonstrate experimentally that our approach significantly outperforms existing methods and establish the new state-of-the-art for training and certifying the robustness of QNNs.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Stability Verification in Stochastic Control Systems via Neural Network Supermartingales

Mathias Lechner
Đorđe Žikelić
Krishnendu Chatterjee
Thomas A. Henzinger

We consider the problem of formally verifying almost-sure (a. s.) asymptotic stability in discrete-time nonlinear stochastic control systems. While verifying stability in deterministic control systems is extensively studied in the literature, verifying stability in stochastic control systems is an open problem. The few existing works on this topic either consider only specialized forms of stochasticity or make restrictive assumptions on the system, rendering them inapplicable to learning algorithms with neural network policies. In this work, we present an approach for general nonlinear stochastic control problems with two novel aspects: (a) instead of classical stochastic extensions of Lyapunov functions, we use ranking supermartingales (RSMs) to certify a. s. asymptotic stability, and (b) we present a method for learning neural network RSMs. We prove that our approach guarantees a. s. asymptotic stability of the system and provides the first method to obtain bounds on the stabilization time, which stochastic Lyapunov functions do not. Finally, we validate our approach experimentally on a set of nonlinear stochastic reinforcement learning environments with neural network policies.

PDF Details

NeurIPS Conference 2021 Conference Paper

Infinite Time Horizon Safety of Bayesian Neural Networks

Mathias Lechner
Đorđe Žikelić
Krishnendu Chatterjee
Thomas Henzinger

Bayesian neural networks (BNNs) place distributions over the weights of a neural network to model uncertainty in the data and the network's prediction. We consider the problem of verifying safety when running a Bayesian neural network policy in a feedback loop with infinite time horizon systems. Compared to the existing sampling-based approaches, which are inapplicable to the infinite time horizon setting, we train a separate deterministic neural network that serves as an infinite time horizon safety certificate. In particular, we show that the certificate network guarantees the safety of the system over a subset of the BNN weight posterior's support. Our method first computes a safe weight set and then alters the BNN's weight posterior to reject samples outside this set. Moreover, we show how to extend our approach to a safe-exploration reinforcement learning setting, in order to avoid unsafe trajectories during the training of the policy. We evaluate our approach on a series of reinforcement learning benchmarks, including non-Lyapunovian safety specifications.

PDF Details

AAAI Conference 2021 Conference Paper

Scalable Verification of Quantized Neural Networks

Thomas A. Henzinger
Mathias Lechner
Đorđe Žikelić

Formal verification of neural networks is an active topic of research, and recent advances have significantly increased the size of the networks that verification tools can handle. However, most methods are designed for verification of an idealized model of the actual network which works over real arithmetic and ignores rounding imprecisions. This idealization is in stark contrast to network quantization, which is a technique that trades numerical precision for computational efficiency and is, therefore, often applied in practice. Neglecting rounding errors of such low-bit quantized neural networks has been shown to lead to wrong conclusions about the network’s correctness. Thus, the desired approach for verifying quantized neural networks would be one that takes these rounding errors into account. In this paper, we show that verifying the bitexact implementation of quantized neural networks with bitvector specifications is PSPACE-hard, even though verifying idealized real-valued networks and satisfiability of bit-vector specifications alone are each in NP. Furthermore, we explore several practical heuristics toward closing the complexity gap between idealized and bit-exact verification. In particular, we propose three techniques for making SMT-based verification of quantized neural networks more scalable. Our experiments demonstrate that our proposed methods allow a speedup of up to three orders of magnitude over existing approaches.

PDF Details