Arrow Research search

Author name cluster

Peter Vrancx

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

TMLR Journal 2024 Journal Article

Depth Scaling in Graph Neural Networks: Understanding the Flat Curve Behavior

  • Diana Gomes
  • Kyriakos Efthymiadis
  • Ann Nowe
  • Peter Vrancx

Training deep Graph Neural Networks (GNNs) has proved to be a challenging task. A key goal of many new GNN architectures is to enable the depth scaling seen in other types of deep learning models. However, unlike deep learning methods in other domains, deep GNNs do not show significant performance boosts when compared to their shallow counterparts (resulting in a flat curve of performance over depth). In this work, we investigate some of the reasons why this goal of depth still eludes GNN researchers. We also question the effectiveness of current methods to train deep GNNs and show evidence of different types of pathological behavior in these networks. Our results suggest that current approaches hide the problems with deep GNNs rather than solve them, as current deep GNNs are only as discriminative as their respective shallow versions.

ICML Conference 2020 Conference Paper

Batch Reinforcement Learning with Hyperparameter Gradients

  • Byung-Jun Lee 0001
  • Jongmin Lee 0004
  • Peter Vrancx
  • Dongho Kim
  • Kee-Eung Kim

We consider the batch reinforcement learning problem where the agent needs to learn only from a fixed batch of data, without further interaction with the environment. In such a scenario, we want to prevent the optimized policy from deviating too much from the data collection policy since the estimation becomes highly unstable otherwise due to the off-policy nature of the problem. However, imposing this requirement too strongly will result in a policy that merely follows the data collection policy. Unlike prior work where this trade-off is controlled by hand-tuned hyperparameters, we propose a novel batch reinforcement learning approach, batch optimization of policy and hyperparameter (BOPAH), that uses a gradient-based optimization of the hyperparameter using held-out data. We show that BOPAH outperforms other batch reinforcement learning algorithms in tabular and continuous control tasks, by finding a good balance to the trade-off between adhering to the data collection policy and pursuing the possible policy improvement.

ICML Conference 2019 Conference Paper

Per-Decision Option Discounting

  • Anna Harutyunyan
  • Peter Vrancx
  • Philippe Hamel
  • Ann Nowé
  • Doina Precup

In order to solve complex problems an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modeled through options, offers the ability to reason at many timescales, but the horizon length is still determined by the discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that naturally scales the agent’s horizon with option length. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

RLDM Conference 2019 Conference Abstract

Per-Decision Option Discounting

  • Anna Harutyunyan
  • Peter Vrancx
  • Philippe Hamel
  • Ann Nowe
  • Doina Precup

In order to solve complex problems an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modeled through options, offers the ability to reason at many timescales, but the horizon length is still determined by the discount factor of the underlying Markov Deci- sion Process. We propose a modification to the options framework that allows the agent’s horizon to grow naturally as its actions become more complex and extended in time. We show that the proposed option- step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

AAAI Conference 2018 Conference Paper

Learning With Options That Terminate Off-Policy

  • Anna Harutyunyan
  • Peter Vrancx
  • Pierre-Luc Bacon
  • Doina Precup
  • Ann Nowé

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides the option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy well, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(β), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(β) by casting learning with options into a common framework with wellstudied multi-step off-policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.

AAAI Conference 2018 Conference Paper

Reinforcement Learning in POMDPs With Memoryless Options and Option-Observation Initiation Sets

  • Denis Steckelmacher
  • Diederik Roijers
  • Anna Harutyunyan
  • Peter Vrancx
  • Hélène Plisnier
  • Ann Nowé

Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more ef- ficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.

AAMAS Conference 2017 Conference Paper

Analysing Congestion Problems in Multi-agent Reinforcement Learning

  • Roxana Radulescu
  • Peter Vrancx
  • Ann Nowé

We extend the study of congestion problems to a more realistic scenario, the Road Network Domain (RND), where the resources are no longer independent, but rather part of a network, thus choosing one path will also impact the load of another one having common road segments. We demonstrate the application of state-of-the-art multi-agent reinforcement learning methods for this new congestion model and analyse their performance. RND allows us to highlight an important limitation of resource abstraction and show that the difference rewards approach manages to better capture and inform the agents about the dynamics of the environment.

KER Journal 2016 Journal Article

A reinforcement learning approach to coordinate exploration with limited communication in continuous action games

  • Abdel Rodríguez
  • Peter Vrancx
  • Ricardo Grau
  • Ann Nowé

Abstract Learning automata are reinforcement learners belonging to the class of policy iterators. They have already been shown to exhibit nice convergence properties in a wide range of discrete action game settings. Recently, a new formulation for a continuous action reinforcement learning automata (CARLA) was proposed. In this paper, we study the behavior of these CARLA in continuous action games and propose a novel method for coordinated exploration of the joint-action space. Our method allows a team of independent learners, using CARLA, to find the optimal joint action in common interest settings. We first show that independent agents using CARLA will converge to a local optimum of the continuous action game. We then introduce a method for coordinated exploration which allows the team of agents to find the global optimum of the game. We validate our approach in a number of experiments.

TAAS Journal 2015 Journal Article

A Reinforcement Learning Approach for Interdomain Routing with Link Prices

  • Peter Vrancx
  • Pasquale Gurzi
  • Abdel Rodriguez
  • Kris Steenhaut
  • Ann Nowé

In today’s Internet, the commercial aspects of routing are gaining importance. Current technology allows Internet Service Providers (ISPs) to renegotiate contracts online to maximize profits. Changing link prices will influence interdomain routing policies that are now driven by monetary aspects as well as global resource and performance optimization. In this article, we consider an interdomain routing game in which the ISP’s action is to set the price for its transit links. Assuming a cheapest path routing scheme, the optimal action is the price setting that yields the highest utility (i.e., profit) and depends both on the network load and the actions of other ISPs. We adapt a continuous and a discrete action learning automaton (LA) to operate in this framework as a tool that can be used by ISP operators to learn optimal price setting. In our model, agents representing different ISPs learn only on the basis of local information and do not need any central coordination or sensitive information exchange. Simulation results show that a single ISP employing LAs is able to learn the optimal price in a stationary environment. By introducing a selective exploration rule, LAs are also able to operate in nonstationary environments. When two ISPs employ LAs, we show that they converge to stable and fair equilibrium strategies.

AAAI Conference 2015 Conference Paper

Expressing Arbitrary Reward Functions as Potential-Based Advice

  • Anna Harutyunyan
  • Sam Devlin
  • Peter Vrancx
  • Ann Nowe

Effectively incorporating external advice is an important problem in reinforcement learning, especially as it moves into the real world. Potential-based reward shaping is a way to provide the agent with a specific form of additional reward, with the guarantee of policy invariance. In this work we give a novel way to incorporate an arbitrary reward function with the same guarantee, by implicitly translating it into the specific form of dynamic advice potentials, which are maintained as an auxiliary value function learnt at the same time. We show that advice provided in this way captures the input reward function in expectation, and demonstrate its efficacy empirically.

EUMAS Conference 2011 Conference Paper

Local Coordination in Online Distributed Constraint Optimization Problems

  • Tim Brys
  • Yann-Michaël De Hauwere
  • Ann Nowé
  • Peter Vrancx

Abstract In cooperative multi-agent systems, group performance often depends more on the interactions between team members, rather than on the performance of any individual agent. Hence, coordination among agents is essential to optimize the group strategy. One solution which is common in the literature is to let the agents learn in a joint action space. Joint Action Learning (JAL) enables agents to explicitly take into account the actions of other agents, but has the significant drawback that the action space in which the agents must learn scales exponentially in the number of agents. Local coordination is a way for a team to coordinate while keeping communication and computational complexity low. It allows the exploitation of a specific dependency structure underlying the problem, such as tight couplings between specific agents. In this paper we investigate a novel approach to local coordination, in which agents learn this dependency structure, resulting in coordination which is beneficial to the group performance. We evaluate our approach in the context of online distributed constraint optimization problems.

AAMAS Conference 2011 Conference Paper

Solving Delayed Coordination Problems in MAS

  • Yann-Micha
  • euml; l De Hauwere
  • Peter Vrancx
  • Ann Now
  • eacute;

Recent research has demonstrated that considering local interactions among agents in specific parts of the state space, is a successful way of simplifying the multi-agent learning process. By taking into account other agents only when a conflict is possible, an agent can significantly reduce the state-action space in which it learns. Current approaches, however, consider only the immediate rewards for detecting conflicts. This restriction is not suitable for realistic systems, where rewards can be delayed and often conflicts between agents become apparent only several time-steps after an action has been taken. In this paper, we contribute a reinforcement learning algorithm that learns where a strategic interaction among agents is needed, several time-steps before the conflict is reflected by the (immediate) reward signal.

AAMAS Conference 2010 Conference Paper

Learning multi-agent state space representations

  • Yann-Micha
  • euml; l De Hauwere
  • Peter Vrancx
  • Ann Nowe

This paper describes an algorithm, called CQ-learning, whichlearns to adapt the state representation for multi-agent systems inorder to coordinate with other agents. We propose a multi-levelapproach which builds a progressively more advanced representation of the learning problem. The idea is that agents start with aminimal single agent state space representation, which is expandedonly when necessary. In cases where agents detect conflicts, theyautomatically expand their state to explicitly take into account theother agents. These conflict situations are then analyzed in an attempt to find an abstract representation which generalises over theproblem states. Our system allows agents to learn effective policies, while avoiding the exponential state space growth typical inmulti-agent environments. Furthermore, the method we introduceto generalise over conflict states allows knowledge to be transferredto unseen and possibly more complex situations. Our research departs from previous efforts in this area of multi-agent learning because our agents combine state space generalisation with an agent-centric point of view. The algorithms that we introduce can beused in robotic systems to automatically reduce the sensor information to what is essential to solve the problem at hand. This isa must when dealing with multiple agents, since learning in suchenvironments is a cumbersome task due to the massive amount ofinformation, much of which may be irrelevant. In our experimentswe demonstrate a simulation of such environments using variousgridworlds.

AAMAS Conference 2010 Conference Paper

Taking Turns in General Sum Markov Games

  • Peter Vrancx
  • Katja Verbeeck
  • Ann Nowe

This paper provides a novel approach to multi-agent coordination in general sum Markov games. Contrary to whatis common in multi-agent learning, our approach does notfocus on reaching a particular equilibrium between agentpolicies. Instead, it learns a basis set of special joint agentpolicies, over which it can randomize to build different solutions. The main idea is to tackle a Markov game by decomposingit into a set of multi-agent common interest problems; eachreflecting one agent's preferences in the system. With onlya minimum of coordination, simple reinforcement learningagents using Parameterised Learning Automata are able tosolve this set of common interest problems in parallel. Asa result, a team of simple learning agents becomes able toswitch play between desired joint policies rather than mixingindividual policies.

AAMAS Conference 2008 Conference Paper

Switching Dynamics of Multi-Agent Learning

  • Peter Vrancx
  • Karl Tuyls
  • Ronald Westra
  • Ann Now
  • eacute;

This paper presents the dynamics of multi-agent reinforcement learning in multiple state problems. We extend previous work that formally modelled the relation between reinforcement learning agents and replicator dynamics in stateless multi-agent games. More precisely, in this work we use a combination of replicator dynamics and switching dynamics to model multi-agent learning automata in multi-state games. This is the first time that the dynamics of problems with more than one state is considered with replicator equations. Previously, it was unclear how the replicator dynamics of stateless games had to be extended to account for multiple states. We use our model to visualize the basin of attraction of the learning agents and the boundaries of switching dynamics at which an agent possibly arrives in a new dynamical system. Our model allows to analyze and predict the behavior of the different learning agents in a wide variety of multi-state problems. In our experiments we illustrate this powerful method in two games with two agents and two states.