Author name cluster

Matthew Taylor

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

2 author rows

AAAI Conference 2023 System Paper

Augmenting Flight Training with AI to Efficiently Train Pilots

Michael Guevarra
Srijita Das
Christabel Wayllace
Carrie Demmans Epp
Matthew Taylor
Alan Tay

We propose an AI-based pilot trainer to help students learn how to fly aircraft. First, an AI agent uses behavioral cloning to learn flying maneuvers from qualified flight instructors. Later, the system uses the agent's decisions to detect errors made by students and provide feedback to help students correct their errors. This paper presents an instantiation of the pilot trainer. We focus on teaching straight and level flying maneuvers by automatically providing formative feedback to the human student.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Ignorance is Bliss: Robust Control via Information Gating

Manan Tomar
Riashat Islam
Matthew Taylor
Sergey Levine
Philip Bachman

Informational parsimony provides a useful inductive bias for learning representations that achieve better generalization by being robust to noise and spurious correlations. We propose information gating as a way to learn parsimonious representations that identify the minimal information required for a task. When gating information, we can learn to reveal as little information as possible so that a task remains solvable, or hide as little information as possible so that a task becomes unsolvable. We gate information using a differentiable parameterization of the signal-to-noise ratio, which can be applied to arbitrary values in a network, e. g. , erasing pixels at the input layer or activations in some intermediate layer. When gating at the input layer, our models learn which visual cues matter for a given task. When gating intermediate layers, our models learn which activations are needed for subsequent stages of computation. We call our approach InfoGating. We apply InfoGating to various objectives such as multi-step forward and inverse dynamics models, Q-learning, and behavior cloning, highlighting how InfoGating can naturally help in discarding information not relevant for control. Results show that learning to identify and use minimal information can improve generalization in downstream tasks. Policies based on InfoGating are considerably more robust to irrelevant visual features, leading to improved pretraining and finetuning of RL models.

PDF Details

AAAI Conference 2023 Conference Paper

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construc- tion is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high performance policies. We demonstrate ROSA’s properties in three didactic experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Multiagent Q-learning with Sub-Team Coordination

Wenhan Huang
Kai Li
Kun Shao
Tianze Zhou
Matthew Taylor
Jun Luo
Dongge Wang
Hangyu Mao

In many real-world cooperative multiagent reinforcement learning (MARL) tasks, teams of agents can rehearse together before deployment, but then communication constraints may force individual agents to execute independently when deployed. Centralized training and decentralized execution (CTDE) is increasingly popular in recent years, focusing mainly on this setting. In the value-based MARL branch, credit assignment mechanism is typically used to factorize the team reward into each individual’s reward — individual-global-max (IGM) is a condition on the factorization ensuring that agents’ action choices coincide with team’s optimal joint action. However, current architectures fail to consider local coordination within sub-teams that should be exploited for more effective factorization, leading to faster learning. We propose a novel value factorization framework, called multiagent Q-learning with sub-team coordination (QSCAN), to flexibly represent sub-team coordination while honoring the IGM condition. QSCAN encompasses the full spectrum of sub-team coordination according to sub-team size, ranging from the monotonic value function class to the entire IGM function class, with familiar methods such as QMIX and QPLEX located at the respective extremes of the spectrum. Experimental results show that QSCAN’s performance dominates state-of-the-art methods in matrix games, predator-prey tasks, the Switch challenge in MA-Gym. Additionally, QSCAN achieves comparable performances to those methods in a selection of StarCraft II micro-management tasks.

PDF Details

RLDM Conference 2019 Conference Abstract

Multi Type Mean Field Reinforcement Learning

Sriram Ganapathi Subramanian
Pascal Poupart
Matthew Taylor
Nidhi Hegde

Mean field theory has been integrated with the field of multiagent reinforcement learning to enable multiagent algorithms to scale to a large number of interacting agents in the environment. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field games, which is the assumption that all agents in the environment are playing almost similar strategies and have the same goal. We consider two new testbeds for the field of many agent reinforcement learning, based on the standard MAgents testbed for many agent environments. Here we consider two different kinds of mean field games. In the first kind of games, agents belong to predefined types that are known a priori. In the second kind of games, the type of each agent is unknown and therefore must be learned based on observations. We introduce new algorithms for each of the scenarios and demonstrate superior performance to state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.

PDF Details

RLDM Conference 2019 Conference Abstract

Opponent Modeling with Actor-Critic Methods in Deep Reinforcement Learn- ing

Pablo Hernandez-Leal
Bilal Kartal
Matthew Taylor

Asynchronous methods for deep reinforcement learning quickly became highly popular due to their good performance and ability to be distributed across threads or CPUs. Despite outstanding results in many scenarios, there are still many open questions about these methods. In this paper we explore how asynchronous methods, in particular Asynchronous Advantage Actor-Critic (A3C), can be extended with opponent modeling to accelerate learning. Inspired by recent works on representational learning and multiagent deep reinforcement learning, we propose two architectures in this paper: the first one based on parameter sharing, and the second one based on opponent policy features. Both architectures aim to learn the opponent’s policy as an auxiliary task, besides the standard actor (policy) and critic (values). We performed experiments in two domains one cooperative and one competitive. The former is a problem of coordinated multiagent object transportation, the latter is a two-player mini version of the Pommerman game. Our results suggest that the proposed architectures outperformed the standard A3C architecture when learning a best response in terms of training time and average rewards.

PDF Details

RLDM Conference 2019 Conference Abstract

Predicting When to Expect Terminal States Improves Deep RL

Bilal Kartal
Pablo Hernandez-Leal
Matthew Taylor

Deep reinforcement learning has achieved great successes in recent years, but there are still sev- eral open challenges, such as convergence to locally optimal policies and sample inefficiency. In this paper, we contribute a novel self-supervised auxiliary task, i. e. , Terminal Prediction (TP), estimating temporal closeness to terminal states for episodic tasks. The intuition is to help representation learning by letting the agent predict how close it is to a terminal state, while learning its control policy. Although TP could be integrated with multiple algorithms, this paper focuses on Asynchronous Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. In our evaluation, we conducted experiments on a set of Atari games and on a mini version of the multi-agent Pommerman game. Our results on Atari games suggest that A3C-TP outperforms standard A3C in some games and in others it has statistically similar performance. In Pommerman, our proposed method provides significant improvement both in learning efficiency and converging to better policies against different opponents.

PDF Details

AAAI Conference 2017 Conference Paper

Scalable Multitask Policy Gradient Reinforcement Learning

Salam El Bsat
Haitham Bou Ammar
Matthew Taylor

Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer from scalability issues when considering large number of tasks. The main reasons behind this limitation is the reliance on centralized solutions. This paper proposes to a novel distributed multitask RL framework, improving the scalability across many different types of tasks. Our framework maps multitask RL to an instance of general consensus and develops an efﬁcient decentralized solver. We justify the correctness of the algorithm both theoretically and empirically: we ﬁrst proof an improvement of convergence speed to an order of O 1 k with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks.

PDF Details

RLDM Conference 2015 Conference Abstract

Reward Shaping by Demonstration

Halit Suay
Sonia Chernova
Tim Brys
Vrije Universiteit Brussel
Matthew Taylor

Potential-based reward shaping is a theoretically sound way of incorporating prior knowledge in a reinforcement learning setting. While providing flexibility for choosing the potential function, this method guarantees the convergence of the final policy, regardless of the properties of the potential function. How- ever, this flexibility of choice, may cause confusion when making a design decision for a specific domain, as the number of possible candidates for a potential function can be overwhelming. Moreover, the poten- tial function either can be manually designed, to bias the behavior of the learner, or can be recovered from prior knowledge, e. g. from human demonstrations. In this paper we investigate the efficacy of two different ways for using a potential function recovered from human demonstrations. First approach uses a mixture of Gaussian distributions generated by samples collected during demonstrations (Gaussian-Shaping), and the second approach uses a reward function recovered from demonstrations with Relative Entropy Inverse Re- inforcement Learning (RE-IRL-Shaping). We present our findings in Cart-Pole, Mountain Car, and Puddle World domains. Our results show that Gaussian-Shaping can provide an efficient reward heuristic, acceler- ating learning through its ability to capture local information, and RE-IRL-Shaping can be more resilient to bad demonstrations. We report a brief analysis of our findings and we aim to provide a future reference for reinforcement learning agent designers, who consider using reward shaping by human demonstrations.

PDF Details

AAAI Conference 2015 Conference Paper

Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment

Haitham Bou Ammar
Eric Eaton
Paul Ruvolo
Matthew Taylor

The success of applying policy gradient reinforcement learning (RL) to difﬁcult control tasks hinges crucially on the ability to determine a sensible initialization for the policy. Transfer learning methods tackle this problem by reusing knowledge gleaned from solving other related tasks. In the case of multiple task domains, these algorithms require an inter-task mapping to facilitate knowledge transfer across domains. However, there are currently no general methods to learn an inter-task mapping without requiring either background knowledge that is not typically present in RL settings, or an expensive analysis of an exponential number of inter-task mappings in the size of the state and action spaces. This paper introduces an autonomous framework that uses unsupervised manifold alignment to learn intertask mappings and effectively transfer samples between different task domains. Empirical results on diverse dynamical systems, including an application to quadrotor control, demonstrate its effectiveness for cross-domain transfer in the context of policy gradient RL.

PDF Details

EWRL Workshop 2015 Workshop Paper

Using PCA to Efficiently Represent State Spaces

William Curran
Tim Brys
Matthew Taylor
William Smart

Reinforcement learning algorithms need to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces. This is known as the curse of dimensionality. By projecting the agent’s state onto a low-dimensional manifold, we can represent the state space in a smaller and more efficient representation. By using this representation during learning, the agent can converge to a good policy much faster. We test this approach in the Mario Benchmarking Domain. When using dimensionality reduction in Mario, learning converges much faster to a good policy. But, there is a critical convergence-performance trade-off. By projecting onto a low-dimensional manifold, we are ignoring important data. In this paper, we explore this trade-off of convergence and performance. We find that learning in as few as 4 dimensions (instead of 9), we can improve performance past learning in the full dimensional space at a faster convergence rate.

Details

AAAI Conference 2014 Conference Paper

A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback

Robert Loftin
James MacGlashan
Bei Peng
Matthew Taylor
Michael Littman
Jeff Huang
David Roberts

This paper introduces two novel algorithms for learning behaviors from human-provided rewards. The primary novelty of these algorithms is that instead of treating the feedback as a numeric reward signal, they interpret feedback as a form of discrete communication that depends on both the behavior the trainer is trying to teach and the teaching strategy used by the trainer. For example, some human trainers use a lack of feedback to indicate whether actions are correct or incorrect, and interpreting this lack of feedback accurately can significantly improve learning speed. Results from user studies show that humans use a variety of training strategies in practice and both algorithms can learn a contextual bandit task faster than algorithms that treat the feedback as numeric. Simulated trainers are also employed to evaluate the algorithms in both contextual bandit and sequential decision-making tasks with similar results.

PDF Details

AAAI Conference 2014 Conference Paper

Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

Tim Brys
Ann Nowé
Daniel Kudenko
Matthew Taylor

Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multiobjective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective’s estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique’s decisions, yielding insights into the nature of the problems being solved.

PDF Details

AAMAS Conference 2012 Conference Paper

Reinforcement Learning Transfer via Sparse Coding

Haitham Bou Ammar
Karl Tuyls
Matthew Taylor
Kurt Driessen
Gerhard Weiss

Although Reinforcement Learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up learning through the adaptation of previously learned behaviors in similar tasks. Transfer techniques often use an inter-task mapping, which determines how a pair of tasks are related. Instead of relying on a hand-coded inter-task mapping, this paper proposes a novel transfer learning method capable of autonomously creating an inter-task mapping by using a novel combination of sparse coding, sparse projection learning and sparse Gaussian processes. We also propose two new transfer algorithms (\emph{TrLSPI} and \emph{TrFQI}) based on least squares policy iteration and fitted-Q-iteration. Experiments not only show successful transfer of information between similar tasks, inverted pendulum to cart pole, but also between two very different domains: mountain car to cart pole. This paper empirically shows that the learned inter-task mapping can be successfully used to (1) improve the performance of a learned policy on a fixed number of samples, (2) reduce the learning times needed by the algorithms to converge to a policy on a fixed number of samples, and (3) converge faster to a near-optimal policy given a large number of samples.

PDF

AAMAS Conference 2012 Conference Paper

Towards Student/Teacher Learning in Sequential Decision Tasks

Lisa Torrey
Matthew Taylor

PDF

AAAI Conference 2010 Conference Paper

Evolving Compiler Heuristics to Manage Communication and Contention

Matthew Taylor
Katherine Coons
Behnam Robatmili
Bertrand Maher
Doug Burger
Kathryn McKinley

As computer architectures become increasingly complex, hand-tuning compiler heuristics becomes increasingly tedious and time consuming for compiler developers. This paper presents a case study that uses a genetic algorithm to learn a compiler policy. The target policy implicitly balances communication and contention among processing elements of the TRIPS processor, a physically realized prototype chip. We learn specialized policies for individual programs as well as general policies that work well across all programs. We also employ a two-stage method that first classifies the code being compiled based on salient characteristics, and then chooses a specialized policy based on that classification. This work is particularly interesting for the AI community because it 1) emphasizes the need for increased collaboration between AI researchers and researchers from other branches of computer science and 2) discusses a machine learning setup where training on the custom hardware requires weeks of training, rather than the more typical minutes or hours.

PDF Details

AAMAS Conference 2010 Conference Paper

When Should There be a "Me" in "Team"? Distributed Multi-Agent Optimization Under Uncertainty

Matthew Taylor
Manish Jain
Yanqin Jin
Makoto Yokoo
Milind Tambe

Increasing teamwork between agents typically increases the performance of a multi-agent system, at the cost of increased communication and higher computational complexity. This work examines joint actions in the context of a multi-agent optimizationproblem where agents must cooperate to balance exploration andexploitation. Surprisingly, results show that increased teamworkcan hurt agent performance, even when communication and computation costs are ignored, which we term the team uncertaintypenalty. This paper introduces the above phenomena, analyzes it, and presents algorithms to reduce the effect of the penalty in ourproblem setting.

PDF

IJCAI Conference 2009 Conference Paper

Manish Jain
Matthew Taylor
Milind Tambe
Makoto Yokoo

Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous ﬁnal reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.

PDF Details

ICML Conference 2009 Conference Paper

Workshop summary: Results of the 2009 reinforcement learning competition

David Wingate
Carlos Diuk
Lihong Li 0001
Matthew Taylor
Jordan Frank

Details

AAMAS Conference 2008 Conference Paper

Autonomous Transfer for Reinforcement Learning

Matthew Taylor
Gregory Kuhlmann
Peter Stone

Recent work in transfer learning has succeeded in making reinforcement learning algorithms more efficient by incorporating knowledge from previous tasks. However, such methods typically must be provided either a full model of the tasks or an explicit relation mapping one task into the other. An autonomous agent may not have access to such high-level information, but would be able to analyze its experience to find similarities between tasks. In this paper we introduce Modeling Approximate State Transitions by Exploiting Regression (MASTER), a method for automatically learning a mapping from one task to another through an agent’s experience. We empirically demonstrate that such learned relationships can significantly improve the speed of a reinforcement learning algorithm in a series of Mountain Car tasks. Additionally, we demonstrate that our method may also assist with the difficult problem of task selection for transfer.

PDF

AAMAS Conference 2005 Conference Paper

Behavior Transfer for Value-Function-Based Reinforcement Learning

Matthew Taylor
Peter Stone

IROS Conference 2002 Conference Paper

A hybrid cognitive-reactive multi-agent controller

Magdalena D. Bugajska
Alan C. Schultz
J. Greg Trafton
Matthew Taylor
Farilee Mintz

The purpose of this paper is to introduce a hybrid cognitive-reactive system, which integrates a machine-learning algorithm (SAMUEL, an evolutionary algorithm-based rule-learning system) with a computational cognitive model (written in ACT-R). In this system, the learning algorithm handles reactive aspects of the task and provides an adaptation mechanism, while the cognitive model handles cognitive aspects of the task and ensures the realism of the behavior. In this study, the controller architecture is used to implement a controller for a team of micro-air vehicles performing reconnaissance and surveillance.

Details