Author name cluster

Frans Oliehoek

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

1 author row

NeurIPS Conference 2022 Conference Paper

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Miguel Suau
Jinke He
Mustafa Mert Çelikok
Matthijs Spaan
Frans Oliehoek

Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, making their full-scale simulation computationally slow. In this paper, we show how to factorize large networked systems of many agents into multiple local regions such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local regions exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.

PDF Details

NeurIPS Conference 2020 Conference Paper

Influence-Augmented Online Planning for Complex Environments

Jinke He
Miguel Suau de Castro
Frans Oliehoek

How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world scenarios are complex in nature and their simulators are often computationally demanding, which severely limits the performance of online planners. In this work, we propose influence-augmented online planning, a principled method to transform a factored simulator of the entire environment into a local simulator that samples only the state variables that are most relevant to the observation and reward of the planning agent and captures the incoming influence from the rest of the environment using machine learning methods. Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment.

PDF Details

NeurIPS Conference 2020 Conference Paper

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Elise van der Pol
Daniel Worrall
Herke van Hoof
Frans Oliehoek
Max Welling

This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.

PDF Details

NeurIPS Conference 2020 Conference Paper

Multi-agent active perception with prediction rewards

Mikko Lauri
Frans Oliehoek

Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The accuracy is quantified by a centralized prediction reward determined by a centralized decision-maker who perceives the observations gathered by all agents after the task ends. In this paper, we model multi-agent active perception as a decentralized partially observable Markov decision process (Dec-POMDP) with a convex centralized prediction reward. We prove that by introducing individual prediction actions for each agent, the problem is converted into a standard Dec-POMDP with a decentralized prediction reward. The loss due to decentralization is bounded, and we give a sufficient condition for when it is zero. Our results allow application of any Dec-POMDP solution algorithm to multi-agent active perception problems, and enable planning to reduce uncertainty without explicit computation of joint estimates. We demonstrate the empirical usefulness of our results by applying a standard Dec-POMDP algorithm to multi-agent active perception problems, showing increased scalability in the planning horizon.

PDF Details

AAMAS Conference 2017 Conference Paper

Decentralised Online Planning for Multi-Robot Warehouse Commissioning

Daniel Claes
Frans Oliehoek
Hendrik Baier
Karl Tuyls

Warehouse commissioning is a complex task in which a team of robots needs to gather and deliver items as fast and efficiently as possible while adhering to the constraint capacity of the robots. Typical centralised control approaches can quickly become infeasible when dealing with many robots. Instead, we tackle this spatial task allocation problem via distributed planning on each robot in the system. State of the art distributed planning approaches suffer from a number of limiting assumptions and ad-hoc approximations. This paper demonstrates how to use Monte Carlo Tree Search (MCTS) to overcome these limitations and provide scalability in a more principled manner. Our simulation-based evaluation demonstrates that this translates to higher task performance, especially when tasks get more complex. Moreover, this higher performance does not come at the cost of scalability: in fact, the proposed approach scales better than the previous best approach, demonstrating excellent performance on an 8-robot team servicing a warehouse comprised of over 200 locations.

PDF

AAAI Conference 2017 Conference Paper

Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method

Zhiguang Cao
Hongliang Guo
Jie Zhang
Frans Oliehoek
Ulrich Fastenrath

The stochastic shortest path problem is of crucial importance for the development of sustainable transportation systems. Existing methods based on the probability tail model seek for the path that maximizes the probability of arriving at the destination before a deadline. However, they suffer from low accuracy and/or high computational cost. We design a novel Q-learning method where the converged Q-values have the practical meaning as the actual probabilities of arriving on time so as to improve accuracy. By further adopting dynamic neural networks to learn the value function, our method can scale well to large road networks with arbitrary deadlines. Experimental results on real road networks demonstrate the signiﬁcant advantages of our method over other counterparts.

PDF Details

AAAI Conference 2016 Conference Paper

Energy- and Cost-Efficient Pumping Station Control

Timon Kanters
Frans Oliehoek
Michael Kaisers
Stan van den Bosch
Joep Grispen
Jeroen Hermans

With renewable energy becoming more common, energy prices ﬂuctuate more depending on environmental factors such as the weather. Consuming energy without taking volatile prices into consideration can not only become expensive, but may also increase the peak load, which requires energy providers to generate additional energy using less environmentfriendly methods. In the Netherlands, pumping stations that maintain the water levels of polder canals are large energy consumers, but the controller software currently used in the industry does not take real-time energy availability into account. We investigate if existing AI planning techniques have the potential to improve upon the current solutions. In particular, we propose a light weight but realistic simulator and investigate if an online planning method (UCT) can utilise this simulator to improve the cost-efﬁciency of pumping station control policies. An empirical comparison with the current control algorithms indicates that substantial cost, and thus peak load, reduction can be attained.

PDF Details

AAAI Conference 2016 Conference Paper

Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs

Philipp Robbel
Frans Oliehoek
Mykel Kochenderfer

Many solution methods for Markov Decision Processes (MDPs) exploit structure in the problem and are based on value function factorization. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, restricting problem sizes and types that can be handled. We present an approach to mitigate this limitation for certain types of multiagent systems, exploiting a property that can be thought of as “anonymous inﬂuence” in the factored MDP. We show how representational beneﬁts from anonymity translate into computational efﬁciencies, both for variable elimination in a factor graph and for the approximate linear programming solution to factored MDPs. Our methods scale to factored MDPs that were previously unsolvable, such as the control of a stochastic disease process over densely connected graphs with 50 nodes and 25 agents.

PDF Details

AAAI Conference 2016 Conference Paper

Solving Transition-Independent Multi-Agent MDPs with Sparse Interactions

Joris Scharpff
Diederik Roijers
Frans Oliehoek
Matthijs Spaan
Mathijs de Weerdt

In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to ﬁnd an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fullyobservable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transitionindependent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these dependencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially speciﬁed joint policies can be efﬁciently computed. We propose CoRe, a novel branchand-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than available alternatives and ﬁnds solutions to previously unsolvable problems.

PDF Details

AAAI Conference 2015 Conference Paper

Exploiting Submodular Value Functions for Faster Dynamic Sensor Selection

Yash Satsangi
Shimon Whiteson
Frans Oliehoek

A key challenge in the design of multi-sensor systems is the efficient allocation of scarce resources such as bandwidth, CPU cycles, and energy, leading to the dynamic sensor selection problem in which a subset of the available sensors must be selected at each timestep. While partially observable Markov decision processes (POMDPs) provide a natural decision-theoretic model for this problem, the computational cost of POMDP planning grows exponentially in the number of sensors, making it feasible only for small problems. We propose a new POMDP planning method that uses greedy maximization to greatly improve scalability in the number of sensors. We show that, under certain conditions, the value function of a dynamic sensor selection POMDP is submodular and use this result to bound the error introduced by performing greedy maximization. Experimental results on a realworld dataset from a multi-camera tracking system in a shopping mall show it achieves similar performance to existing methods but incurs only a fraction of the computational cost, leading to much better scalability in the number of cameras.

PDF Details

AAAI Conference 2015 Conference Paper

Scalable Planning and Learning for Multiagent POMDPs

Christopher Amato
Frans Oliehoek

Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems.

PDF Details

RLDM Conference 2013 Conference Abstract

Scalable Bayesian Reinforcement Learning for Multiagent POMDPs

Christopher Amato
Frans Oliehoek
Eric Shyu

Bayesian methods for reinforcement learning (RL) allow model uncertainty to be considered explicitly and offer a principled way of dealing with the exploration/exploitation tradeoff. However, for multiagent systems there have been few such approaches, and none of them apply to problems with state uncertainty. In this paper, we fill this gap by proposing a Bayesian RL framework for multiagent partially observable Markov decision processes that is able to take advantage of structure present in many problems. In this framework, a team of agents operates in a centralized fashion, but has uncertainty about the model of the environment. Fitting many real-world situations, we consider the case where agents learn the appropriate models while acting in an online fashion. Because it can quickly become intractable to choose the optimal action in naive versions of this online learning problem, we propose a more scalable approach based on sample-based search and factored value functions for the set of agents. Experimental results show that we are able to provide high quality solutions to large problems even with a large amount of initial model uncertainty.

PDF Details

AAMAS Conference 2012 Conference Paper

Heuristic Search of Multiagent Influence Space

Stefan Witwicki
Frans Oliehoek
Leslie Kaelbling

Multiagent planning under uncertainty has seen important progress in recent years. Two techniques, in particular, have substantially advanced efficiency and scalability of planning. Multiagent heuristic search gains traction by pruning large portions of the joint policy space deemed suboptimal by heuristic bounds. Alternatively, influence-based abstraction reformulates the search space of joint policies into a smaller space of influences, which represent the probabilistic effects that agents' policies may exert on one another. These techniques have been used independently, but never together, to solve larger problems (for Dec-POMDPs and subclasses) than previously possible. In this paper, we take the logical albeit nontrivial next step of combining multiagent A* search and influence-based abstraction into a single algorithm. The mathematical foundation that we provide, such as partially-specified influence evaluation and admissible heuristic definition, enables an initial investigation into whether the two techniques bring complementary gains. Our empirical results indicate that A* can provide significant computational savings on top of those already afforded by influence-space search, thereby bringing a significant contribution to the field of multiagent planning under uncertainty.

PDF

AAAI Conference 2012 Conference Paper

Influence-Based Abstraction for Multiagent Systems

Frans Oliehoek
Stefan Witwicki
Leslie Kaelbling

This paper presents a theoretical advance by which factored POSGs can be decomposed into local models. We formalize the interface between such local models as the influence agents can exert on one another; and we prove that this interface is sufficient for decoupling them. The resulting influence-based abstraction substantially generalizes previous work on exploiting weakly-coupled agent interaction structures. Therein lie several important contributions. First, our general formulation sheds new light on the theoretical relationships among previous approaches, and promotes future empirical comparisons that could come by extending them beyond the more specific problem contexts for which they were developed. More importantly, the influence-based approaches that we generalize have shown promising improvements in the scalability of planning for more restrictive models. Thus, our theoretical result here serves as the foundation for practical algorithms that we anticipate will bring similar improvements to more general planning contexts, and also into other domains such as approximate planning, decisionmaking in adversarial domains, and online learning.

PDF Details

AAMAS Conference 2012 Conference Paper

Tree-based Pruning for Multiagent POMDPs with Delayed Communication

Frans Oliehoek
Matthijs Spaan

Multiagent POMDPs provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcast information is delayed by at most one time step. Such an assumption is in fact more appropriate for applications in which response time is critical. However, naive application of incremental pruning, the core of many state-of-the-art POMDP techniques, is intractable for MPOMDP-DCs. We overcome this problem by introducing a tree-based pruning technique. Experiments show that the method outperforms naive incremental pruning by orders of magnitude, allowing for the solution of larger problems.

PDF

AAAI Conference 2012 Conference Paper

Tree-Based Solution Methods for Multiagent POMDPs with Delayed Communication

Frans Oliehoek
Matthijs Spaan

Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcasted information is delayed by at most one time step. This model allows agents to act on their most recent (private) observation. Such an assumption is a strict generalization over having agents wait until the global information is available and is more appropriate for applications in which response time is critical. In this setting, however, value function backups are significantly more costly, and naive application of incremental pruning, the core of many state-of-the-art optimal POMDP techniques, is intractable. In this paper, we overcome this problem by demonstrating that computation of the MPOMDP-DC backup can be structured as a tree and by introducing two novel tree-based pruning techniques that exploit this structure in an effective way. We experimentally show that these methods have the potential to outperform naive incremental pruning by orders of magnitude, allowing for the solution of larger problems.

PDF Details

AAMAS Conference 2010 Conference Paper

Heuristic Search for Identical Payoff Bayesian Games

Frans Oliehoek
Matthijs Spaan
Jilles Dibangoye
Christopher Amato

Bayesian games can be used to model single-shot decisionproblems in which agents only possess incomplete information about other agents, and hence are important for multiagent coordination under uncertainty. Moreover they canbe used to represent different stages of sequential multiagent decision problems, such as POSGs and DEC-POMDPs, and appear as an operation in many methods for multiagentplanning under uncertainty. In this paper we are interestedin coordinating teams of cooperative agents. While manysuch problems can be formulated as Bayesian games withidentical payoffs, little work has been done to improve solution methods. To help address this situation, we provide abranch and bound algorithm that optimally solves identicalpayoff Bayesian games. Our results show a marked improvement over previous methods, obtaining speedups of up to 3orders of magnitude for synthetic random games, and reaching 10 orders of magnitude speedups for games in a DEC-POMDP context. This not only allows Bayesian games tobe solved more efficiently, but can also improve multiagentplanning techniques such as top-down and bottom-up algorithms for decentralized POMDPs.

PDF

AAMAS Conference 2008 Conference Paper

Exploiting Locality of Interaction in Factored Dec-POMDPs

Frans Oliehoek
Matthijs Spaan
Shimon Whiteson
Nikos Vlassis

Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents in a factored representation. Factored Dec-POMDP representations have been proposed before, but only for Dec- POMDPs whose transition and observation models are fully independent. Such strong assumptions simplify the planning problem, but result in models with limited applicability. By contrast, we consider general factored Dec-POMDPs for which we analyze the model dependencies over space (locality of interaction) and time (horizon of the problem). We also present a formulation of decomposable value functions. Together, our results allow us to exploit the problem structure as well as heuristics in a single framework that is based on collaborative graphical Bayesian games (CGBGs). A preliminary experiment shows a speedup of two orders of magnitude.

PDF