Author name cluster

Heinz Koeppl

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers

2 author rows

AAAI Conference 2025 Conference Paper

Bounded Rationality Equilibrium Learning in Mean Field Games

Yannick Eich
Christian Fabian
Kai Cui
Heinz Koeppl

Mean field games (MFGs) tractably model behavior in large agent populations. The literature on learning MFG equilibria typically focuses on finding Nash equilibria (NE), which assume perfectly rational agents and are hence implausible in many realistic situations. To overcome these limitations, we incorporate bounded rationality into MFGs by leveraging the well-known concept of quantal response equilibria (QRE). Two novel types of MFG QRE enable the modeling of large agent populations where individuals only noisily estimate the true objective. We also introduce a second source of bounded rationality to MFGs by restricting the agents' planning horizon. The resulting novel receding horizon (RH) MFGs are combined with QRE and existing approaches to model different aspects of bounded rationality in MFGs. We formally define MFG QRE and RH MFGs and compare them to existing equilibrium concepts such as entropy-regularized NE. Subsequently, we design generalized fixed point iteration and fictitious play algorithms to learn QRE and RH equilibria. After a theoretical analysis, we give different examples to evaluate the capabilities of our learning algorithms and outline practical differences between the equilibrium concepts.

PDF Details DOI

ICML Conference 2025 Conference Paper

Learning Mean Field Control on Sparse Graphs

Christian Fabian 0001
Kai Cui 0001
Heinz Koeppl

Large agent networks are abundant in applications and nature and pose difficult challenges in the field of multi-agent reinforcement learning (MARL) due to their computational and theoretical complexity. While graphon mean field games and their extensions provide efficient learning algorithms for dense and moderately sparse agent networks, the case of realistic sparser graphs remains largely unsolved. Thus, we propose a novel mean field control model inspired by local weak convergence to include sparse graphs such as power law networks with coefficients above two. Besides a theoretical analysis, we design scalable learning algorithms which apply to the challenging class of graph sequences with finite first moment. We compare our model and algorithms for various examples on synthetic and real world networks with mean field algorithms based on Lp graphons and graphexes. As it turns out, our approach outperforms existing methods in many examples and on various networks due to the special design aiming at an important, but so far hard to solve class of MARL problems.

Details

TMLR Journal 2025 Journal Article

Mean-Field RL for Large-Scale Unit-Capacity Pickup-and-Delivery Problems

Kai Cui
Sharif Azem
Christian Fabian
Kirill Kuroptev
Ramin Khalili
Osama Abboud
Florian Steinke
Heinz Koeppl

Solving large-scale vehicle routing problems (VRPs) is NP-hard and poses a computational challenge in numerous applications such as logistics. Meanwhile, mean-field control (MFC) provides a tractable and rigorous approach to controlling many agents. We provide a solution to pickup-and-delivery VRPs via scalable MFC. In combination with reinforcement learning (RL) and clustering, our MFC approach efficiently scales to large-scale VRPs. We perform a theoretical analysis of our MFC-based approximation, giving convergence results for large VRP instances and error bounds for clustering-based approximations. We verify our algorithms on different datasets and compare them against solutions such as OR-Tools, PyVRP and heuristics, showing scalability in terms of speed for mean-field methods, for the first time in discrete optimization. Overall, our work establishes a novel synthesis of MFC-based RL techniques, vehicle routing problems and clustering approximations, to solve a hard discrete optimization problem of practical use in a scalable way.

PDF Details

NeurIPS Conference 2025 Conference Paper

What do you know? Bayesian knowledge inference for navigating agents

Matthias Schultheis
Jana-Sophie Schönfeld
Constantin Rothkopf
Heinz Koeppl

Human behavior is characterized by continuous learning to reduce uncertainties about the world in pursuit of goals. When trying to understand such behavior from observations, it is essential to account for this adaptive nature and reason about the uncertainties that may have led to seemingly suboptimal decisions. Nevertheless, most inverse approaches to sequential decision-making focus on inferring cost functions underlying stationary behavior or are limited to low-dimensional tasks. In this paper, we address this gap by considering the problem of inferring an agent's knowledge or awareness about the environment based on a given trajectory. We assume that the agent aims to reach a goal in an environment they only partially know, and integrates new information into their plan as they act. We propose a Bayesian approach to infer their latent knowledge state, leveraging an approximate navigation model that optimistically incorporates partial information while accounting for uncertainty. By combining sample-based Bayesian inference with dynamic graph algorithms, we achieve an efficient method for computing posterior beliefs about the agent's knowledge. Empirical validation using simulated behavioral data and human data from an online experiment demonstrates that our model effectively captures human navigation under uncertainty and reveals interpretable insights into their environmental knowledge.

PDF Details

ICRA Conference 2024 Conference Paper

A Modular Aerial System Based on Homogeneous Quadrotors with Fault-Tolerant Control

Mengguang Li
Kai Cui 0001
Heinz Koeppl

The standard quadrotor is one of the most popular and widely used aerial vehicle of recent decades, offering great maneuverability with mechanical simplicity. However, the under-actuation characteristic limits its applications, especially when it comes to generating desired wrench with six degrees of freedom (DOF). Therefore, existing work often compromises between mechanical complexity and the controllable DOF of the aerial system. To take advantage of the mechanical simplicity of a standard quadrotor, we propose a modular aerial system, IdentiQuad, that combines only homogeneous quadrotor-based modules. Each IdentiQuad can be operated alone like a standard quadrotor, but at the same time allows task-specific assembly, increasing the controllable DOF of the system. Each module is interchangeable within its assembly. We also propose a general controller for different configurations of assemblies, capable of tolerating rotor failures and balancing the energy consumption of each module. The functionality and robustness of the system and its controller are validated using physics-based simulations for different assembly configurations.

Details

NeurIPS Conference 2024 Conference Paper

Graph Structure Inference with BAM: Neural Dependency Processing via Bilinear Attention

Philipp Froehlich
Heinz Koeppl

Detecting dependencies among variables is a fundamental task across scientific disciplines. We propose a novel neural network model for graph structure inference, which aims to learn a mapping from observational data to the corresponding underlying dependence structures. The model is trained with variably shaped and coupled simulated input data and requires only a single forward pass through the trained network for inference. Central to our approach is a novel bilinear attention mechanism (BAM) operating on covariance matrices of transformed data while respecting the geometry of the manifold of symmetric positive definite (SPD) matrices. Inspired by graphical lasso methods, our model optimizes over continuous graph representations in the SPD space, where inverse covariance matrices encode conditional independence relations. Empirical evaluations demonstrate the robustness of our method in detecting diverse dependencies, excelling in undirected graph estimation and showing competitive performance in completed partially directed acyclic graph estimation via a novel two-step approach. The trained model effectively detects causal relationships and generalizes well across different functional forms of nonlinear dependencies.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Learning Decentralized Partially Observable Mean Field Control for Artificial Collective Behavior

Kai Cui 0001
Sascha Hauck
Christian Fabian 0001
Heinz Koeppl

Recent reinforcement learning (RL) methods have achieved success in various domains. However, multi-agent RL (MARL) remains a challenge in terms of decentralization, partial observability and scalability to many agents. Meanwhile, collective behavior requires resolution of the aforementioned challenges, and remains of importance to many state-of-the-art applications such as active matter physics, self-organizing systems, opinion dynamics, and biological or robotic swarms. Here, MARL via mean field control (MFC) offers a potential solution to scalability, but fails to consider decentralized and partially observable systems. In this paper, we enable decentralized behavior of agents under partial information by proposing novel models for decentralized partially observable MFC (Dec-POMFC), a broad class of problems with permutation-invariant agents allowing for reduction to tractable single-agent Markov decision processes (MDP) with single-agent RL solution. We provide rigorous theoretical results, including a dynamic programming principle, together with optimality guarantees for Dec-POMFC solutions applied to finite swarms of interest. Algorithmically, we propose Dec-POMFC-based policy gradient methods for MARL via centralized training and decentralized execution, together with policy gradient approximation guarantees. In addition, we improve upon state-of-the-art histogram-based MFC by kernel methods, which is of separate interest also for fully observable MFC. We evaluate numerically on representative collective behavior tasks such as adapted Kuramoto and Vicsek swarming models, being on par with state-of-the-art MARL. Overall, our framework takes a step towards RL-based engineering of artificial collective behavior via MFC.

Details

AAAI Conference 2024 Conference Paper

Learning Discrete-Time Major-Minor Mean Field Games

Kai Cui
Gökçe Dayanıklı
Mathieu Laurière
Matthieu Geist
Olivier Pietquin
Heinz Koeppl

Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and cannot model major players that strongly influence other players, severely limiting the class of problems that can be handled. We propose a novel discrete time version of major-minor MFGs (M3FGs), along with a learning algorithm based on fictitious play and partitioning the probability simplex. Importantly, M3FGs generalize MFGs with common noise and can handle not only random exogeneous environment states but also major players. A key challenge is that the mean field is stochastic and not deterministic as in standard MFGs. Our theoretical investigation verifies both the M3FG model and its algorithmic solution, showing firstly the well-posedness of the M3FG model starting from a finite game of interest, and secondly convergence and approximation guarantees of the fictitious play algorithm. Then, we empirically verify the obtained theoretical results, ablating some of the theoretical assumptions made, and show successful equilibrium learning in three example problems. Overall, we establish a learning framework for a novel and broad class of tractable games.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Learning Mean Field Games on Sparse Graphs: A Hybrid Graphex Approach

Christian Fabian 0001
Kai Cui 0001
Heinz Koeppl

Learning the behavior of large agent populations is an important task for numerous research areas. Although the field of multi-agent reinforcement learning (MARL) has made significant progress towards solving these systems, solutions for many agents often remain computationally infeasible and lack theoretical guarantees. Mean Field Games (MFGs) address both of these issues and can be extended to Graphon MFGs (GMFGs) to include network structures between agents. Despite their merits, the real world applicability of GMFGs is limited by the fact that graphons only capture dense graphs. Since most empirically observed networks show some degree of sparsity, such as power law graphs, the GMFG framework is insufficient for capturing these network topologies. Thus, we introduce the novel concept of Graphex MFGs (GXMFGs) which builds on the graph theoretical concept of graphexes. Graphexes are the limiting objects to sparse graph sequences that also have other desirable features such as the small world property. Learning equilibria in these games is challenging due to the rich and sparse structure of the underlying graphs. To tackle these challenges, we design a new learning algorithm tailored to the GXMFG setup. This hybrid graphex learning approach leverages that the system mainly consists of a highly connected core and a sparse periphery. After defining the system and providing a theoretical analysis, we state our learning approach and demonstrate its learning capabilities on both synthetic graphs and real-world networks. This comparison shows that our GXMFG learning algorithm successfully extends MFGs to a highly relevant class of hard, realistic learning problems that are not accurately addressed by current MARL and MFG methods.

Details

ICML Conference 2024 Conference Paper

Major-Minor Mean Field Multi-Agent Reinforcement Learning

Kai Cui 0001
Christian Fabian 0001
Anam Tahir
Heinz Koeppl

Multi-agent reinforcement learning (MARL) remains difficult to scale to many agents. Recent MARL using Mean Field Control (MFC) provides a tractable and rigorous approach to otherwise difficult cooperative MARL. However, the strict MFC assumption of many independent, weakly-interacting agents is too inflexible in practice. We generalize MFC to instead simultaneously model many similar and few complex agents – as Major-Minor Mean Field Control (M3FC). Theoretically, we give approximation results for finite agent control, and verify the sufficiency of stationary policies for optimality together with a dynamic programming principle. Algorithmically, we propose Major-Minor Mean Field MARL (M3FMARL) for finite agent systems instead of the limiting system. The algorithm is shown to approximate the policy gradient of the underlying M3FC MDP. Finally, we demonstrate its capabilities experimentally in various scenarios. We observe a strong performance in comparison to state-of-the-art policy gradient MARL methods.

Details

IJCAI Conference 2024 Conference Paper

Negative-Binomial Randomized Gamma Dynamical Systems for Heterogeneous Overdispersed Count Time Sequences

Rui Huang
Sikun Yang
Heinz Koeppl

Modeling count-valued time sequences has been receiving growing interests because count time sequences naturally arise in physical and social domains. Poisson gamma dynamical systems (PGDSs) are newly-developed methods, which can well capture the expressive latent transition structure and bursty dynamics behind count sequences. In particular, PGDSs demonstrate superior performance in terms of data imputation and prediction, compared with canonical linear dynamical system (LDS) based methods. Despite these advantages, PGDS cannot capture the heterogeneous overdispersed behaviours of the underlying dynamic processes. To mitigate this defect, we propose a negative-binomial-randomized gamma Markov process, which not only significantly improves the predictive performance of the proposed dynamical system, but also facilitates the fast convergence of the inference algorithm. Moreover, we develop methods to estimate both factor-structured and graph-structured transition dynamics, which enable us to infer more explainable latent structure, compared with PGDSs. Finally, we demonstrate the explainable latent structure learned by the proposed method, and show its superior performance in imputing missing data and forecasting future observations, compared with the related models.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Optimal Collaborative Transportation for Under-Capacitated Vehicle Routing Problems using Aerial Drone Swarms

Akash Kopparam Sreedhara
Deepesh Padala
Shashank Mahesh
Kai Cui 0001
Mengguang Li
Heinz Koeppl

Swarms of aerial drones have recently been considered for last-mile deliveries in urban logistics or automated construction. At the same time, collaborative transportation of payloads by multiple drones is another important area of recent research. However, efficient coordination algorithms for collaborative transportation of many payloads by many drones remain to be considered. In this work, we formulate the collaborative transportation of payloads by a swarm of drones as a novel, under-capacitated generalization of vehicle routing problems (VRP), which may also be of separate interest. In contrast to standard VRP and capacitated VRP, we must additionally consider waiting times for payloads lifted cooperatively by multiple drones, and the corresponding coordination. Algorithmically, we provide a solution encoding that avoids deadlocks and formulate an appropriate alternating minimization scheme to solve the problem. On the hardware side, we integrate our algorithms with collision avoidance and drone controllers. The approach and the impact of the system integration are successfully verified empirically, both on a swarm of real nano-quadcopters and for large swarms in simulation. Overall, we provide a framework for collaborative transportation with aerial drone swarms, that uses only as many drones as necessary for the transportation of any single payload.

Details

NeurIPS Conference 2023 Conference Paper

Probabilistic inverse optimal control for non-linear partially observable systems disentangles perceptual uncertainty and behavioral costs

Dominik Straub
Matthias Schultheis
Heinz Koeppl
Constantin A. Rothkopf

Inverse optimal control can be used to characterize behavior in sequential decision-making tasks. Most existing work, however, is limited to fully observable or linear systems, or requires the action signals to be known. Here, we introduce a probabilistic approach to inverse optimal control for partially observable stochastic non-linear systems with unobserved action signals, which unifies previous approaches to inverse optimal control with maximum causal entropy formulations. Using an explicit model of the noise characteristics of the sensory and motor systems of the agent in conjunction with local linearization techniques, we derive an approximate likelihood function for the model parameters, which can be computed within a single forward pass. We present quantitative evaluations on stochastic and partially observable versions of two classic control tasks and two human behavioral tasks. Importantly, we show that our method can disentangle perceptual factors and behavioral costs despite the fact that epistemic and pragmatic actions are intertwined in sequential decision-making under uncertainty, such as in active sensing and active learning. The proposed method has broad applicability, ranging from imitation learning to sensorimotor neuroscience.

PDF Details

ICRA Conference 2023 Conference Paper

Scalable Task-Driven Robotic Swarm Control via Collision Avoidance and Learning Mean-Field Control

Kai Cui 0001
Mengguang Li
Christian Fabian 0001
Heinz Koeppl

In recent years, reinforcement learning and its multi-agent analogue have achieved great success in solving various complex control problems. However, multi-agent rein-forcement learning remains challenging both in its theoretical analysis and empirical design of algorithms, especially for large swarms of embodied robotic agents where a definitive toolchain remains part of active research. We use emerging state-of-the-art mean-field control techniques in order to convert many-agent swarm control into more classical single-agent control of distributions. This allows profiting from advances in single-agent reinforcement learning at the cost of assuming weak interaction between agents. However, the mean-field model is violated by the nature of real systems with embodied, physically colliding agents. Thus, we combine collision avoidance and learning of mean-field control into a unified framework for tractably designing intelligent robotic swarm behavior. On the theoretical side, we provide novel approximation guarantees for general mean-field control both in continuous spaces and with collision avoidance. On the practical side, we show that our approach outperforms multi-agent reinforcement learning and allows for decentralized open-loop application while avoiding collisions, both in simulation and real UAV swarms. Overall, we propose a framework for the design of swarm behavior that is both mathematically well-founded and practically useful, enabling the solution of otherwise intractable swarm problems.

Details

NeurIPS Conference 2022 Conference Paper

Forward-Backward Latent State Inference for Hidden Continuous-Time semi-Markov Chains

Nicolai Engelmann
Heinz Koeppl

Hidden semi-Markov Models (HSMM's) - while broadly in use - are restricted to a discrete and uniform time grid. They are thus not well suited to explain often irregularly spaced discrete event data from continuous-time phenomena. We show that non-sampling-based latent state inference used in HSMM's can be generalized to latent Continuous-Time semi-Markov Chains (CTSMC's). We formulate integro-differential forward and backward equations adjusted to the observation likelihood and introduce an exact integral equation for the Bayesian posterior marginals and a scalable Viterbi-type algorithm for posterior path estimates. The presented equations can be efficiently solved using well-known numerical methods. As a practical tool, variable-step HSMM's are introduced. We evaluate our approaches in latent state inference scenarios in comparison to classical HSMM's.

PDF Details

ICLR Conference 2022 Conference Paper

Learning Graphon Mean Field Games and Approximate Nash Equilibria

Kai Cui 0001
Heinz Koeppl

Recent advances at the intersection of dense large graph limits and mean field games have begun to enable the scalable analysis of a broad class of dynamical sequential games with large numbers of agents. So far, results have been largely limited to graphon mean field systems with continuous-time diffusive or jump dynamics, typically without control and with little focus on computational methods. We propose a novel discrete-time formulation for graphon mean field games as the limit of non-linear dense graph Markov games with weak interaction. On the theoretical side, we give extensive and rigorous existence and approximation properties of the graphon mean field solution in sufficiently large systems. On the practical side we provide general learning schemes for graphon mean field equilibria by either introducing agent equivalence classes or reformulating the graphon mean field system as a classical mean field system. By repeatedly finding a regularized optimal control solution and its generated mean field, we successfully obtain plausible approximate Nash equilibria in otherwise infeasible large dense graph games with many agents. Empirically, we are able to demonstrate on a number of examples that the finite-agent behavior comes increasingly close to the mean field behavior for our computed equilibria as the graph or system size grows, verifying our theory. More generally, we successfully apply policy gradient reinforcement learning in conjunction with sequential Monte Carlo methods.

Details

ICML Conference 2022 Conference Paper

Markov Chain Monte Carlo for Continuous-Time Switching Dynamical Systems

Lukas Köhs
Bastian Alt
Heinz Koeppl

Switching dynamical systems are an expressive model class for the analysis of time-series data. As in many fields within the natural and engineering sciences, the systems under study typically evolve continuously in time, it is natural to consider continuous-time model formulations consisting of switching stochastic differential equations governed by an underlying Markov jump process. Inference in these types of models is however notoriously difficult, and tractable computational schemes are rare. In this work, we propose a novel inference algorithm utilizing a Markov Chain Monte Carlo approach. The presented Gibbs sampler allows to efficiently obtain samples from the exact continuous-time posterior processes. Our framework naturally enables Bayesian parameter estimation, and we also include an estimate for the diffusion covariance, which is oftentimes assumed fixed in stochastic differential equations models. We evaluate our framework under the modeling assumption and compare it against an existing variational inference approach.

Details

ICRA Conference 2022 Conference Paper

Nearest-Neighbor-based Collision Avoidance for Quadrotors via Reinforcement Learning

Ramzi Ourari
Kai Cui 0001
Ahmed Elshamanhory
Heinz Koeppl

Collision avoidance algorithms are of central interest to many drone applications. In particular, decentralized approaches may be the key to enabling robust drone swarm solutions in cases where centralized communication becomes computationally prohibitive. In this work, we draw biological inspiration from flocks of starlings (Sturnus vulgaris) and apply the insight to end-to-end learned decentralized collision avoidance. More specifically, we propose a new, scalable observation model following a biomimetic nearest-neighbor information constraint that leads to fast learning and good collision avoidance behavior. By proposing a general reinforcement learning approach, we obtain an end-to-end learning-based approach to integrating collision avoidance with arbitrary tasks such as package collection and formation change. To validate the generality of this approach, we successfully apply our methodology through motion models of medium complexity, modeling momentum and nonetheless allowing direct application to real world quadrotors in conjunction with a standard PID controller. In contrast to prior works, we find that in our sufficiently rich motion model, nearest-neighbor information is indeed enough to learn effective collision avoidance behavior. Our learned policies are tested in simulation and subsequently transferred to real-world drones to validate their real-world applicability.

Details

NeurIPS Conference 2022 Conference Paper

Reinforcement Learning with Non-Exponential Discounting

Matthias Schultheis
Constantin A. Rothkopf
Heinz Koeppl

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton–Jacobi–Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.

PDF Details

ICML Conference 2021 Conference Paper

Active Learning of Continuous-time Bayesian Networks through Interventions

Dominik Linzner
Heinz Koeppl

We consider the problem of learning structures and parameters of Continuous-time Bayesian Networks (CTBNs) from time-course data under minimal experimental resources. In practice, the cost of generating experimental data poses a bottleneck, especially in the natural and social sciences. A popular approach to overcome this is Bayesian optimal experimental design (BOED). However, BOED becomes infeasible in high-dimensional settings, as it involves integration over all possible experimental outcomes. We propose a novel criterion for experimental design based on a variational approximation of the expected information gain. We show that for CTBNs, a semi-analytical expression for this criterion can be calculated for structure and parameter learning. By doing so, we can replace sampling over experimental outcomes by solving the CTBNs master-equation, for which scalable approximations exist. This alleviates the computational burden of sampling possible experimental outcomes in high-dimensions. We employ this framework to recommend interventional sequences. In this context, we extend the CTBN model to conditional CTBNs to incorporate interventions. We demonstrate the performance of our criterion on synthetic and real-world data.

Details

NeurIPS Conference 2021 Conference Paper

Variational Inference for Continuous-Time Switching Dynamical Systems

Lukas Köhs
Bastian Alt
Heinz Koeppl

Switching dynamical systems provide a powerful, interpretable modeling framework for inference in time-series data in, e. g. , the natural sciences or engineering applications. Since many areas, such as biology or discrete-event systems, are naturally described in continuous time, we present a model based on a Markov jump process modulating a subordinated diffusion process. We provide the exact evolution equations for the prior and posterior marginal densities, the direct solutions of which are however computationally intractable. Therefore, we develop a new continuous-time variational inference algorithm, combining a Gaussian process approximation on the diffusion level with posterior inference for Markov jump processes. By minimizing the path-wise Kullback-Leibler divergence we obtain (i) Bayesian latent state estimates for arbitrary points on the real axis and (ii) point estimates of unknown system parameters, utilizing variational expectation maximization. We extensively evaluate our algorithm under the model assumption and for real-world examples.

PDF Details

AAAI Conference 2020 Conference Paper

A Variational Perturbative Approach to Planning in Graph-Based Markov Decision Processes

Dominik Linzner
Heinz Koeppl

Coordinating multiple interacting agents to achieve a common goal is a difﬁcult task with huge applicability. This problem remains hard to solve, even when limiting interactions to be mediated via a static interaction-graph. We present a novel approximate solution method for multi-agent Markov decision problems on graphs, based on variational perturbation theory. We adopt the strategy of planning via inference, which has been explored in various prior works. We employ a non-trivial extension of a novel high-order variational method that allows for approximate inference in large networks and has been shown to surpass the accuracy of existing variational methods. To compare our method to two state-of-the-art methods for multi-agent planning on graphs, we apply the method different standard GMDP problems. We show that in cases, where the goal is encoded as a non-local cost function, our method performs well, while state-of-the-art methods approach the performance of random guess. In a ﬁnal experiment, we demonstrate that our method brings signiﬁcant improvement for synchronization tasks.

PDF Details

ICML Conference 2020 Conference Paper

Continuous Time Bayesian Networks with Clocks

Nicolai Engelmann
Dominik Linzner
Heinz Koeppl

Structured stochastic processes evolving in continuous time present a widely adopted framework to model phenomena occurring in nature and engineering. However, such models are often chosen to satisfy the Markov property to maintain tractability. One of the more popular of such memoryless models are Continuous Time Bayesian Networks (CTBNs). In this work, we lift its restriction to exponential survival times to arbitrary distributions. Current extensions achieve this via auxiliary states, which hinder tractability. To avoid that, we introduce a set of node-wise clocks to construct a collection of graph-coupled semi-Markov chains. We provide algorithms for parameter and structure inference, which make use of local dependencies and conduct experiments on synthetic data and a data-set generated through a benchmark tool for gene regulatory networks. In doing so, we point out advantages compared to current CTBN extensions.

Details

NeurIPS Conference 2020 Conference Paper

POMDPs in Continuous Time and Discrete Spaces

Bastian Alt
Matthias Schultheis
Heinz Koeppl

Many processes, such as discrete event systems in engineering or population dynamics in biology, evolve in discrete space and continuous time. We consider the problem of optimal decision making in such discrete state and action space systems under partial observability. This places our work at the intersection of optimal filtering and optimal control. At the current state of research, a mathematical description for simultaneous decision making and filtering in continuous time with finite state and action spaces is still missing. In this paper, we give a mathematical description of a continuous-time partial observable Markov decision process (POMDP). By leveraging optimal filtering theory we derive a Hamilton-Jacobi-Bellman (HJB) type equation that characterizes the optimal solution. Using techniques from deep learning we approximately solve the resulting partial integro-differential equation. We present (i) an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning. We show the applicability on a set of toy examples which pave the way for future methods providing solutions for high dimensional problems.

PDF Details

UAI Conference 2020 Conference Paper

The Hawkes Edge Partition Model for Continuous-time Event-based Temporal Networks

Sikun Yang
Heinz Koeppl

We propose a novel probabilistic framework to model continuously generated interaction events data. Our goal is to infer the \emph{implicit} community structure underlying the temporal interactions among entities, and also to exploit how the latent structure influence their interaction dynamics. To this end, we model the reciprocating interactions between individuals using mutually-exciting Hawkes processes. The base rate of the Hawkes process for each pair of individuals is built upon the latent representations inferred using the hierarchical gamma process edge partition model (HGaP-EPM). In particular, our model allows the interaction dynamics between each pair of individuals to be modulated by their respective affiliated communities. Moreover, our model can flexibly incorporate the auxiliary individuals’ attributes, or covariates associated with interaction events. Efficient Gibbs sampling and Expectation-Maximization algorithms are developed to perform inference via Pólya-Gamma data augmentation strategy. Experimental results on real-world datasets demonstrate that our model not only achieves competitive performance compared with state-of-the-art methods, but also discovers interpretable latent structure behind the observed temporal interactions.

Details

NeurIPS Conference 2019 Conference Paper

Correlation Priors for Reinforcement Learning

Bastian Alt
Adrian Šošić
Heinz Koeppl

Many decision-making problems naturally exhibit pronounced structures inherited from the characteristics of the underlying environment. In a Markov decision process model, for example, two distinct states can have inherently related semantics or encode resembling physical state configurations. This often implies locally correlated transition dynamics among the states. In order to complete a certain task in such environments, the operating agent usually needs to execute a series of temporally and spatially correlated actions. Though there exists a variety of approaches to capture these correlations in continuous state-action domains, a principled solution for discrete environments is missing. In this work, we present a Bayesian learning framework based on Pólya-Gamma augmentation that enables an analogous reasoning in such cases. We demonstrate the framework on a number of common decision-making related problems, such as imitation learning, subgoal extraction, system identification and Bayesian reinforcement learning. By explicitly modeling the underlying correlation structures of these problems, the proposed approach yields superior predictive performance compared to correlation-agnostic models, even when trained on data sets that are an order of magnitude smaller in size.

PDF Details

ICML Conference 2019 Conference Paper

Moment-Based Variational Inference for Markov Jump Processes

Christian Wildner
Heinz Koeppl

We propose moment-based variational inference as a flexible framework for approximate smoothing of latent Markov jump processes. The main ingredient of our approach is to partition the set of all transitions of the latent process into classes. This allows to express the Kullback-Leibler divergence from the approximate to the posterior process in terms of a set of moment functions that arise naturally from the chosen partition. To illustrate possible choices of the partition, we consider special classes of jump processes that frequently occur in applications. We then extend the results to latent parameter inference and demonstrate the method on several examples.

Details

NeurIPS Conference 2019 Conference Paper

Scalable Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data

Dominik Linzner
Michael Schmidt
Heinz Koeppl

Continuous-time Bayesian Networks (CTBNs) represent a compact yet powerful framework for understanding multivariate time-series data. Given complete data, parameters and structure can be estimated efficiently in closed-form. However, if data is incomplete, the latent states of the CTBN have to be estimated by laboriously simulating the intractable dynamics of the assumed CTBN. This is a problem, especially for structure learning tasks, where this has to be done for each element of a super-exponentially growing set of possible structures. In order to circumvent this notorious bottleneck, we develop a novel gradient-based approach to structure learning. Instead of sampling and scoring all possible structures individually, we assume the generator of the CTBN to be composed as a mixture of generators stemming from different structures. In this framework, structure learning can be performed via a gradient-based optimization of mixture weights. We combine this approach with a new variational method that allows for a closed-form calculation of this mixture marginal likelihood. We show the scalability of our method by learning structures of previously inaccessible sizes from synthetic and real-world data.

PDF Details

AAAI Conference 2018 Conference Paper

A Poisson Gamma Probabilistic Model for Latent Node-Group Memberships in Dynamic Networks

Sikun Yang
Heinz Koeppl

We present a probabilistic model for learning from dynamic relational data, wherein the observed interactions among networked nodes are modeled via the Bernoulli Poisson link function, and the underlying network structure are characterized by nonnegative latent node-group memberships, which are assumed to be gamma distributed. The latent memberships evolve according to Markov processes. The optimal number of latent groups can be determined by data itself. The computational complexity of our method scales with the number of non-zero links, which makes it scalable to large sparse dynamic relational data. We present batch and online Gibbs sampling algorithms to perform model inference. Finally, we demonstrate the model’s performance on both synthetic and real-world datasets compared to state-of-the-art methods.

PDF Details

NeurIPS Conference 2018 Conference Paper

Cluster Variational Approximations for Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data

Dominik Linzner
Heinz Koeppl

Continuous-time Bayesian networks (CTBNs) constitute a general and powerful framework for modeling continuous-time stochastic processes on networks. This makes them particularly attractive for learning the directed structures among interacting entities. However, if the available data is incomplete, one needs to simulate the prohibitively complex CTBN dynamics. Existing approximation techniques, such as sampling and low-order variational methods, either scale unfavorably in system size, or are unsatisfactory in terms of accuracy. Inspired by recent advances in statistical physics, we present a new approximation scheme based on cluster-variational methods that significantly improves upon existing variational approximations. We can analytically marginalize the parameters of the approximate CTBN, as these are of secondary importance for structure learning. This recovers a scalable scheme for direct structure learning from incomplete and noisy time-series data. Our approach outperforms existing methods in terms of scalability.

PDF Details

ICML Conference 2018 Conference Paper

Dependent Relational Gamma Process Models for Longitudinal Networks

Sikun Yang
Heinz Koeppl

A probabilistic framework based on the covariate-dependent relational gamma process is developed to analyze relational data arising from longitudinal networks. The proposed framework characterizes networked nodes by nonnegative node-group memberships, which allow each node to belong to multiple latent groups simultaneously, and encodes edge probabilities between each pair of nodes using a Bernoulli Poisson link to the embedded latent space. Within the latent space, our framework models the birth and death dynamics of individual groups via a thinning function. Our framework also captures the evolution of individual node-group memberships over time using gamma Markov processes. Exploiting the recent advances in data augmentation and marginalization techniques, a simple and efficient Gibbs sampler is proposed for posterior computation. Experimental results on a simulation study and three real-world temporal network data sets demonstrate the model’s capability, competitive performance and scalability compared to state-of-the-art methods.

Details

JMLR Journal 2018 Journal Article

Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling

Adrian Šošić
Elmar Rueckert
Jan Peters
Abdelhak M. Zoubir
Heinz Koeppl

Advances in the field of inverse reinforcement learning (IRL) have led to sophisticated inference frameworks that relax the original modeling assumption of observing an agent behavior that reflects only a single intention. Instead of learning a global behavioral model, recent IRL methods divide the demonstration data into parts, to account for the fact that different trajectories may correspond to different intentions, e.g., because they were generated by different domain experts. In this work, we go one step further: using the intuitive concept of subgoals, we build upon the premise that even a single trajectory can be explained more efficiently locally within a certain context than globally, enabling a more compact representation of the observed behavior. Based on this assumption, we build an implicit intentional model of the agent's goals to forecast its behavior in unobserved situations. The result is an integrated Bayesian prediction framework that significantly outperforms existing IRL solutions and provides smooth policy estimates consistent with the expert's plan. Most notably, our framework naturally handles situations where the intentions of the agent change over time and classical IRL algorithms fail. In addition, due to its probabilistic nature, the model can be straightforwardly applied in active learning scenarios to guide the demonstration process of the expert. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

PDF Details

AAMAS Conference 2017 Conference Paper

Inverse Reinforcement Learning in Swarm Systems

Adrian &Scaron; o&scaron; ic
Wasiur R. KhudaBukhsh
Abdelhak M. Zoubir
Heinz Koeppl

Inverse reinforcement learning (IRL) has become a useful tool for learning behavioral models from demonstration data. However, IRL remains mostly unexplored for multi-agent systems. In this paper, we show how the principle of IRL can be extended to homogeneous large-scale problems, inspired by the collective swarming behavior of natural systems. In particular, we make the following contributions to the field: 1) We introduce the swarMDP framework, a subclass of decentralized partially observable Markov decision processes endowed with a swarm characterization. 2) Exploiting the inherent homogeneity of this framework, we reduce the resulting multi-agent IRL problem to a single-agent one by proving that the agent-specific value functions in this model coincide. 3) To solve the corresponding control problem, we propose a novel heterogeneous learning scheme that is particularly tailored to the swarm setting. Results on two example systems demonstrate that our framework is able to produce meaningful local reward models from which we can replicate the observed global system dynamics.

PDF

JMLR Journal 2016 Journal Article

A Variational Approach to Path Estimation and Parameter Inference of Hidden Diffusion Processes

Tobias Sutter
Arnab Ganguly
Heinz Koeppl

We consider a hidden Markov model, where the signal process, given by a diffusion, is only indirectly observed through some noisy measurements. The article develops a variational method for approximating the hidden states of the signal process given the full set of observations. This, in particular, leads to systematic approximations of the smoothing densities of the signal process. The paper then demonstrates how an efficient inference scheme, based on this variational approach to the approximation of the hidden states, can be designed to estimate the unknown parameters of stochastic differential equations. Two examples at the end illustrate the efficacy and the accuracy of the presented method. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

PDF Details

AAAI Conference 2016 Conference Paper

Marginalized Continuous Time Bayesian Networks for Network Reconstruction from Incomplete Observations

Lukas Studer
Loic Paulevé
Christoph Zechner
Matthias Reumann
María Rodríguez Martínez
Heinz Koeppl

Continuous Time Bayesian Networks (CTBNs) provide a powerful means to model complex network dynamics. However, their inference is computationally demanding — especially if one considers incomplete and noisy time-series data. The latter gives rise to a joint state- and parameter estimation problem, which can only be solved numerically. Yet, ﬁnding the exact parameterization of the CTBN has often only secondary importance in practical scenarios. We therefore focus on the structure learning problem and present a way to analytically marginalize the Markov chain underlying the CTBN model with respect its parameters. Since the resulting stochastic process is parameter-free, its inference reduces to an optimal ﬁltering problem. We solve the latter using an efﬁcient parallel implementation of a sequential Monte Carlo scheme. Our framework enables CTBN inference to be applied to incomplete noisy time-series data frequently found in molecular biology and other disciplines.

PDF Details

TCS Journal 2012 Journal Article

Lumpability abstractions of rule-based systems

Jerome Feret
Thomas Henzinger
Heinz Koeppl
Tatjana Petrov

Details DOI