Arrow Research search

Author name cluster

Kagan Tumer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

56 papers
2 author rows

Possible papers

56

ECAI Conference 2025 Conference Paper

Multiagent Quality-Diversity for Effective Adaptation

  • Siddarth Iyer
  • Ayhan Alp Aydeniz
  • Gaurav Dixit
  • Kagan Tumer

Robust adaptation in multiagent settings requires learning not just a single optimal behavior, but a repertoire of high-performing and diverse team behaviors that can succeed under environmental contingencies. Traditional multiagent reinforcement learning methods typically converge to a single specialized team behavior, limiting their adaptability. Recent approaches like Mix-ME promote behavioral diversity but rely solely on evolutionary operators, often resulting in sample-inefficiency and uncoordinated team composition. This work introduces Multiagent Sample-Efficient Quality-Diversity (MASQD), a learning framework that produces an archive of diverse, high-performing multiagent teams. MASQD builds on the Cross-Entropy Method Reinforcement Learning algorithm and extends it to the multiagent setting by representing teams as parameter-shared neural networks, directing exploration from previously discovered behaviors, and guiding refinement through a descriptor-conditioned critic. Through this coupling of anchored exploration and targeted exploitation, MASQD produces functional diversity: teams that are not only behaviorally distinct but also robust and effective under varied conditions. Experiments across four Multiagent MuJoCo tasks show that MASQD outperforms state-of-the-art baselines in both team fitness and functional diversity.

AAMAS Conference 2025 Conference Paper

Safe Entropic Agents under Team Constraints

  • Ayhan Alp Aydeniz
  • Enrico Marchesini
  • Robert Loftin
  • Christopher Amato
  • Kagan Tumer

Safety is a critical concern in multiagent reinforcement learning (MARL), yet typical safety-aware methods constrain agent behaviors, limiting exploration—essential for discovering e�ective cooperation. Existing approaches mainly enforce individual constraints, overlooking potential bene�ts of joint (team) constraints. We analyze team constraints theoretically and practically, introducing entropic exploration for constrained MARL (E2C). E2C maximizes observation entropy to encourage exploration while ensuring safety at the individual and team levels. Experiments across diverse domains demonstrate that E2C matches or outperforms common baselines in task performance while reducing unsafe behaviors by up to 50%.

AAMAS Conference 2024 Conference Paper

Entropy Seeking Constrained Multiagent Reinforcement Learning

  • Ayhan Alp Aydeniz
  • Enrico Marchesini
  • Christopher Amato
  • Kagan Tumer

Multiagent Reinforcement Learning (MARL) has been successfully applied to domains requiring close coordination among many agents. However, real-world tasks require safety specifications that are not generally considered by MARL algorithms. In this work, we introduce an Entropy Seeking Constrained (ESC) approach aiming to learn safe cooperative policies for multiagent systems. Unlike previous methods, ESC considers safety specifications while maximizing state-visitation entropy, addressing the exploration issues of constrained-based solutions.

AAMAS Conference 2024 Conference Paper

Indirect Credit Assignment in a Multiagent System

  • Everardo Gonzalez
  • Siddarth Viswanathan
  • Kagan Tumer

Learning in a multiagent system requires structural credit assignment to distill system performance into agent-specific feedback. Fitness shaping methods largely isolate agent credit, but struggle when an agent’s actions do not directly affect system feedback. This work introduces D-Indirect, a fitness shaping method that gives credit for both direct actions and actions that have an indirect impact on the system’s performance. We demonstrate the effectiveness of D-Indirect in a simulated shepherding scenario and our results show that learning with D-Indirect significantly outperforms learning with the standard difference evaluation and the system evaluation when agents indirectly impact system performance.

AAMAS Conference 2024 Conference Paper

Influence-Focused Asymmetric Island Model

  • Andrew Festa
  • Gaurav Dixit
  • Kagan Tumer

Learning good joint-behaviors is challenging in multiagent settings due to the inherent non-stationarity: agents adapt their policies and act simultaneously. This is aggravated when the agents are asymmetric (agents have distinct capabilities and objectives) and must learn complementary behaviors required to work as a team. The Asymmetric Island Model partially addresses this by independently optimizing class-specific and team-wide behaviors. However, optimizing class-specific behaviors in isolation can produce egocentric behaviors that yield sub-optimal inter-class behaviors. This work introduces the Influence-Focused Asymmetric Island model (IF- AIM), a hierarchical framework that explicitly reinforces inter-class behaviors by optimizing class-specific behaviors conditioned on the expectation of behaviors of the complementary agent classes. An experiment in the harvest environment highlights the effectiveness of our method in optimizing adaptable inter-class behaviors.

ECAI Conference 2024 Conference Paper

Objective-Informed Diversity for Multi-Objective Multiagent Coordination

  • Gaurav Dixit
  • Kagan Tumer

To coordinate in multiagent settings characterized by multiple objectives, asymmetric agents (agents with distinct capabilities and preferences) must learn diverse behaviors to balance trade-offs between agent-specific and team objectives. Hierarchical methods partially address this by leveraging a combination of Quality-Diversity methods that illuminate the behavior space and evolutionary algorithms that use non-dominated sorting over the explored behaviors to improve coverage in the objective space. However, optimizing diverse behaviors and trade-offs in isolation is susceptible to producing egocentric behaviors that favor agent-specific objectives at the cost of team objectives. This work introduces the Multi-Objective Informed Island Model (MOI-IM), an asymmetric multiagent learning framework that fosters diverse behaviors and rich inter-agent relationships, necessary to balance potentially conflicting and misaligned objectives. An evolutionary algorithm improves coverage in the objective space by evolving a population of teams, while a gradient-based optimization infers and progressively explores the behavior space by fluidly adapting search to regions that produce policies with non-dominated trade-offs. The two processes are coupled via shared replay buffers to ensure alignment between coverage in the behavior and objective space. Empirical results on an asymmetric multi-objective coordination problem highlight MOI-IM’s ability to produce teams that can express diverse trade-offs and robust relationships required to balance misaligned objectives.

ICRA Conference 2023 Conference Paper

Contextual Multi-Objective Path Planning

  • Anna Nickelson
  • Kagan Tumer
  • William D. Smart

Many critical robot environments, such as healthcare and security, require robots to account for contextdependent criteria when performing their functions (e. g. , navigation). Such domains require decisions that balance multiple factors, making it difficult for robots to make contextually appropriate decisions. Multi-Objective Optimization (MOO) methods offer a potential solution by trading off between objectives; however concepts like Pareto fronts are not only expensive to compute but struggle with differentiating among solutions on the Pareto front. This work introduces the Contextual Multi-Objective Path Planning (CMOPP) algorithm, which enables the robot to trade off different complex costs dependent on context. The key insight of this work is to separate the path planning and path cost estimation into two independent steps, thus significantly reducing computation cost without impacting the quality of the resulting path. As a result, CMOPP is able to accurately model path costs, which provide meaningful trade-offs when choosing a path that best fits the context. We show the benefits of CMOPP on case studies that demonstrate its contextual path planning capabilities. CMOPP finds contextually appropriate paths by first reducing the search space up to 99. 9% to a near-optimal set of paths. This reduction enables the generation of accurate path cost models, using up to 90% less computation than similar methods.

ECAI Conference 2023 Conference Paper

Knowledge Injection for Multiagent Systems via Counterfactual Perception Shaping

  • Nicholas Zerbel
  • Kagan Tumer

Reward shaping can be used to train coordinated agent teams, but most learning approaches optimize for training conditions and by design, are limited by knowledge directly captured by the reward function. Advances in adaptive systems (e. g. , transfer learning) may enable agents to quickly learn new policies in response to changing conditions, but retraining agents is both difficult and risks losing team coordination altogether. In this work we introduce Counterfactual Knowledge Injection (CKI), a novel approach to injecting high-level information into a multiagent system outside of the learning process. CKI encodes knowledge into counterfactual state representations to shape agent perceptions of the system so that their current policies better match the current system conditions. We demonstrate CKI in a multiagent exploration task where agents must collaborate to observe various Points of Interest (POI). We show that CKI successfully imparts high-level system knowledge to agents in response to imperceptible changes. We also show that CKI enables agents to adjust their level of agent-to-agent coordination ranging from tasks individuals can complete up to tasks that require the entire team.

AAMAS Conference 2023 Conference Paper

Learning Inter-Agent Synergies in Asymmetric Multiagent Systems

  • Gaurav Dixit
  • Kagan Tumer

In multiagent systems that require coordination, agents must learn diverse policies that enable them to achieve their individual and team objectives. Multiagent Quality-Diversity methods partially address this problem by filtering the joint space of policies to smaller sub-spaces that make the diversification of agent policies tractable. However, in teams of asymmetric agents (agents with different objectives and capabilities), the search for diversity is primarily driven by the need to find policies that will allow agents to assume complementary roles required to work together in teams. This work introduces Asymmetric Island Model (AIM), a multiagent framework that enables populations of asymmetric agents to learn diverse complementary policies that foster teamwork via dynamic population size allocation on a wide variety of team tasks. The key insight of AIM is that the competitive pressure arising from the distribution of policies on different team-wide tasks drives the agents to explore regions of the policy space that yield specializations that generalize across tasks. Simulation results on multiple variations of a remote habitat problem highlight the strength of AIM in discovering robust synergies that allow agents to operate near-optimally in response to the changing team composition and policies of other agents.

AAMAS Conference 2023 Conference Paper

Multi-Team Fitness Critics For Robust Teaming

  • Joshua Cook
  • Tristan Scheiner
  • Kagan Tumer

Many multiagent systems, such as search and rescue or underwater exploration, rely on generalizable teamwork abilities to achieve complex tasks. Though many ad-hoc teaming algorithms focus on finding an agent’s best fit with static team members, domains with high degrees of uncertainty and dynamic teammates require an agent to cooperate with arbitrary teams. Prior work views this as an issue of uninformative rewards, providing high-quality but potentially expensive evaluation methods to isolate an agent’s contribution. In this work, we provide a local-evaluation-based approach that leverages state trajectories of agents to better identify their impact across multiple teams. The key insight that enables this approach is that agent trajectories and previous experiences carry sufficient information to map agent abilities to team performance. As a result, we are able to train multiple agents to cooperate across arbitrary teams as well as, if not better than, current methods, while only using local information and significantly fewer team evaluations.

AAMAS Conference 2022 Conference Paper

Behavior Exploration and Team Balancing for Heterogeneous Multiagent Coordination

  • Gaurav Dixit
  • Kagan Tumer

Diversity in behaviors is instrumental for robust team performance in many multiagent tasks which require agents to coordinate. Unfortunately, exhaustive search through the agents’ behavior spaces is often intractable. This paper introduces Behavior Exploration for Heterogeneous Teams (BEHT), a multi-level learning framework that enables agents to progressively explore regions of the behavior space that promote team coordination on diverse goals. By combining diversity search to maximize agent-specific rewards and evolutionary optimization to maximize the team-based fitness, our method effectively filters regions of the behavior space that are conducive to agent coordination. We demonstrate the diverse behaviors and synergies that are method allows agents to learn on a multiagent exploration problem.

AAMAS Conference 2021 Conference Paper

Dynamic Skill Selection for Learning Joint Actions

  • Enna Sachdeva
  • Shauharda Khadka
  • Somdeb Majumdar
  • Kagan Tumer

Learning in tightly coupled multiagent settings with sparse rewards is challenging because multiple agents must reach the goal state simultaneously for the team to receive a reward. This is even more challenging under temporal coupling constraints - where agents need to sequentially complete different components of a task in a particular order. Here, a single local reward is inadequate for learning an effective policy. We introduce MADyS, Multiagent Learning via Dynamic Skill Selection, a bi-level optimization framework that learns to dynamically switch between multiple local skills to optimize sparse team objectives. MADyS adopts fast policy gradients to learn local skills using local rewards and an evolutionary algorithm to optimize the sparse team objective by recruiting the most optimal skill at any given time. This eliminates the need to generate a single dense reward via reward shaping or other mixing functions. In environments with both spatial and temporal coupling requirements, we outperform prior methods and provides intuitive visualizations of its skill switching strategy.

ICML Conference 2020 Conference Paper

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

  • Somdeb Majumdar
  • Shauharda Khadka
  • Santiago Miret
  • Stephen Marcus McAleer
  • Kagan Tumer

Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods, such as MADDPG, on a number of difficult coordination benchmarks.

ICML Conference 2019 Conference Paper

Collaborative Evolutionary Reinforcement Learning

  • Shauharda Khadka
  • Somdeb Majumdar
  • Tarek Nassar
  • Zach Dwiel
  • Evren Tumer
  • Santiago Miret
  • Yinyin Liu
  • Kagan Tumer

Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically struggle with achieving effective exploration and are extremely sensitive to the choice of hyperparameters. One reason is that most approaches use a noisy version of their operating policy to explore - thereby limiting the range of exploration. In this paper, we introduce Collaborative Evolutionary Reinforcement Learning (CERL), a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space. A collection of learners - typically proven algorithms like TD3 - optimize over varying time-horizons leading to this diverse portfolio. All learners contribute to and use a shared replay buffer to achieve greater sample efficiency. Computational resources are dynamically distributed to favor the best learners as a form of online algorithm selection. Neuroevolution binds this entire process to generate a single emergent learner that exceeds the capabilities of any individual learner. Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient - notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation.

AAMAS Conference 2019 Conference Paper

Curriculum Learning for Tightly Coupled Multiagent Systems

  • Golden Rockefeller
  • Patrick Mannion
  • Kagan Tumer

In this paper, we leverage curriculum learning (CL) to improve the performance of multiagent systems (MAS) that are trained with the cooperative coevolution of artificial neural networks. We design curricula to progressively change two dimensions: scale (i. e. domain size) and coupling (i. e. the number of agents required to complete a subtask). We demonstrate that CL can successfully mitigate the challenge of learning on a sparse reward signal resulting from a high degree of coupling in complex MAS. We also show that, in most cases, the combination of difference reward shaping with CL can improve performance by up to 56%. We evaluate our CL methods on the tightly coupled multi-rover domain. CL increased converged system performance on all tasks presented. Furthermore, agents were only able to learn when trained with CL for most tasks.

AAMAS Conference 2019 Conference Paper

Memory based Multiagent One Shot Learning

  • Shauharda Khadka
  • Connor Yates
  • Kagan Tumer

One shot learning is particularly difficult in multiagent systems where the relevant information is distributed across agents, and inter-agent interactions shape global emergent behavior. This paper introduces a distributed learning framework called Distributed Modular Memory Unit (DMMU) that creates a shared external memory to enable one shot adaptive learning in multiagent systems. In DMMU, a shared external memory is selectively accessed by agents acting asynchronously and in parallel. Each agent processes its own stream of sequential information independently while interacting with the shared external memory to identify, retain, and propagate salient information. This enables DMMU to rapidly assimilate task features from a group of distributed agents, consolidate it into a reconfigurable external memory, and use it for one shot multiagent learning. We compare the performance of the DMMU framework on a simulated cybersecurity task with traditional feedforward ensembles, LSTM based agents, and a centralized framework. Results demonstrate that DMMU significantly outperforms the other methods and exhibits distributed one shot learning.

AAMAS Conference 2019 Conference Paper

The Impact of Agent Definitions and Interactions on Multiagent Learning for Coordination

  • Jen Jen Chung
  • Damjan Miklić
  • Lorenzo Sabattini
  • Kagan Tumer
  • Roland Siegwart

The state-action space of an individual agent in a multiagent team fundamentally dictates how the individual interacts with the rest of the team. Thus, how an agent is defined in the context of its domain has a significant effect on team performance when learning to coordinate. In this work we explore the trade-offs associated with these design choices, for example, having fewer agents in the team that individually are able to process and act on a wider scope of information about the world versus a larger team of agents where each agent observes and acts in a more local region of the domain. We focus our study on a traffic management domain and highlight the trends in learning performance when applying different agent definitions.

AAMAS Conference 2018 Conference Paper

A Memory-based Multiagent Framework for Adaptive Decision Making

  • Shauharda Khadka
  • Connor Yates
  • Kagan Tumer

Rapid adaptation to dynamically change one’s policy based on a singular observation is a complex problem. This is especially difficult in multiagent systems where the global behavior emerges from inter-agent interactions. In this paper, we introduce a memorybased learning framework called Distributed Modular Memory Unit (DMMU) which enables rapid and adaptive decision making. In DMMU, a shared external memory is selectively accessed by agents acting independently and in parallel. Each agent processes its own stream of sequential information independently while interacting with the shared external memory to identify, retain, and propagate salient information. This enables DMMU to rapidly assimilate task features from a group of distributed agents, consolidate it into a reconfigurable external memory, and use it for one-shot multiagent learning. We compare the performance of the DMMU framework on a simulated cybersecurity task with traditional feedforward ensembles, LSTM based agents, and a centralized framework. Results demonstrate that DMMU significantly outperforms the best LSTM based method by a factor of two and exhibits adaptive decision making to effectively solve this complex task.

NeurIPS Conference 2018 Conference Paper

Evolution-Guided Policy Gradient in Reinforcement Learning

  • Shauharda Khadka
  • Kagan Tumer

Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack of effective exploration, and brittle convergence properties that are extremely sensitive to hyperparameters. Collectively, these challenges severely limit the applicability of these approaches to real world problems. Evolutionary Algorithms (EAs), a class of black box optimization techniques inspired by natural evolution, are well suited to address each of these three challenges. However, EAs typically suffer from high sample complexity and struggle to solve problems that require optimization of a large number of parameters. In this paper, we introduce Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into the EA population periodically to inject gradient information into the EA. ERL inherits EA's ability of temporal credit assignment with a fitness metric, effective exploration with a diverse set of policies, and stability of a population-based approach and complements it with off-policy DRL's ability to leverage gradients for higher sample efficiency and faster learning. Experiments in a range of challenging continuous control benchmarks demonstrate that ERL significantly outperforms prior DRL and EA methods.

AAMAS Conference 2018 Conference Paper

When Less is More: Reducing Agent Noise with Probabilistically Learning Agents

  • Jen Jen Chung
  • Scott Chow
  • Kagan Tumer

Distributed agents concurrently learning to coordinate in a multiagent system can suffer from considerable amounts of agent noise. This is the noise that arises from the non-stationarity of the learning environment for each individual agent since other agents in the system are also constantly updating their policies, thereby continually shifting the goal posts for successful coordination. In this work, we propose a method to reduce agent noise by allowing individual agents to probabilistically determine whether or not to undergo policy updates. We show that using this method to adapt the number of actively learning agents over time provides improvements in convergence speed of the team as a whole without affecting the final converged learning performance.

KER Journal 2016 Journal Article

Combining reward shaping and hierarchies for scaling to large multiagent systems

  • Chris HolmesParker
  • Adrian K. Agogino
  • Kagan Tumer

Abstract Coordinating the actions of agents in multiagent systems presents a challenging problem, especially as the size of the system is increased and predicting the agent interactions becomes difficult. Many approaches to improving coordination within multiagent systems have been developed including organizational structures, shaped rewards, coordination graphs, heuristic methods, and learning automata. However, each of these approaches still have inherent limitations with respect to coordination and scalability. We explore the potential of synergistically combining existing coordination mechanisms such that they offset each others’ limitations. More specifically, we are interested in combining existing coordination mechanisms in order to achieve improved performance, increased scalability, and reduced coordination complexity in large multiagent systems. In this work, we discuss and demonstrate the individual limitations of two well-known coordination mechanisms. We then provide a methodology for combining the two coordination mechanisms to offset their limitations and improve performance over either method individually. In particular, we combine shaped difference rewards and hierarchical organization in the Defect Combination Problem with up to 10 000 sensing agents. We show that combining hierarchical organization with difference rewards can improve both coordination and scalability by decreasing information overhead, structuring agent-to-agent connectivity and control flow, and improving the individual decision-making capabilities of agents. We show that by combining hierarchies and difference rewards, the information overheads and computational requirements of individual agents can be reduced by as much as 99% while simultaneously increasing the overall system performance. Additionally, we demonstrate the robustness of this approach to handling up to 25% agent failures under various conditions.

IROS Conference 2016 Conference Paper

D++: Structural credit assignment in tightly coupled multiagent domains

  • Aida Rahmattalabi
  • Jen Jen Chung
  • Mitchell K. Colby
  • Kagan Tumer

Autonomous multi-robot teams can be used in complex coordinated exploration tasks to improve exploration performance in terms of both speed and effectiveness. However, use of multi-robot systems presents additional challenges. Specifically, in domains where the robots' actions are tightly coupled, coordinating multiple robots to achieve cooperative behavior at the group level is difficult. In this paper, we demonstrate that reward shaping can greatly benefit learning in multi-robot exploration tasks. We propose a novel reward framework based on the idea of counterfactuals to tackle the coordination problem in tightly coupled domains. We show that the proposed algorithm provides superior performance (166% performance improvement and a quadruple convergence speed up) compared to policies learned using either the global reward or the difference reward [1].

AAMAS Conference 2016 Conference Paper

Local Approximation of Difference Evaluation Functions

  • Mitchell Colby
  • Theodore Duchow-Pressley
  • Jen Jen Chung
  • Kagan Tumer

Difference evaluation functions have resulted in excellent multiagent behavior in many domains, including air traf- fic and mobile robot control. However, calculating difference evaluation functions requires determining the value of a counterfactual system objective function, which is often difficult when the system objective function is unknown or global state and action information is unavailable. In this work, we demonstrate that a local estimate of the system evaluation function may be used to estimate difference evaluations using readily available information, allowing for difference evaluations to be computed in multiagent systems where the mathematical form of the objective function is not known. This approximation technique is tested in two domains, and we demonstrate that approximating difference evaluation functions results in better performance and faster learning than when using global evaluation functions. Finally, we demonstrate the effectiveness of the learned policies on a set of Pioneer P3-DX robots.

JAAMAS Journal 2015 Journal Article

Fitness function shaping in multiagent cooperative coevolutionary algorithms

  • Mitchell Colby
  • Kagan Tumer

Abstract Coevolution is a promising approach to evolve teams of agents which must cooperate to achieve some system objective. However, in many coevolutionary approaches, credit assignment is often subjective and context dependent, as the fitness of an individual agent strongly depends on the actions of the agents with which it collaborates. In order to alleviate this problem, we introduce a cooperative coevolutionary algorithm which biases the evolutionary search as well as shapes agent fitness functions to promote behavior that benefits the system-level performance. More specifically, we bias the search using a hall of fame approximation of optimal collaborators, and shape the agent fitness using the difference evaluation function. Our results show that shaping agent fitness with the difference evaluation improves system performance by up to 50 %, and adding an additional fitness bias improves performance by up to 75 % in our experiments. Finally, an analysis of system performance as a function of computational cost demonstrates that this algorithm makes extremely efficient use of computational resources, having a higher performance as a function of computational cost than any other algorithm tested.

IROS Conference 2015 Conference Paper

Implicit adaptive multi-robot coordination in dynamic environments

  • Mitchell K. Colby
  • Jen Jen Chung
  • Kagan Tumer

Multi-robot teams offer key advantages over single robots in exploration missions by increasing efficiency (explore larger areas), reducing risk (partial mission failure with robot failures), and enabling new data collection modes (multi-modal observations). However, coordinating multiple robots to achieve a system-level task is difficult, particularly if the task may change during the mission. In this work, we demonstrate how multiagent cooperative coevolutionary algorithms can develop successful control policies for dynamic and stochastic multi-robot exploration missions. We find that agents using difference evaluation functions (a technique that quantifies each individual agent's contribution to the team) provides superior system performance (up to 15%) compared to global evaluation functions and a hand-coded algorithm.

IROS Conference 2015 Conference Paper

Learning to trick cost-based planners into cooperative behavior

  • Carrie Rebhuhn
  • Ryan Skeele
  • Jen Jen Chung
  • Geoffrey A. Hollinger
  • Kagan Tumer

In this paper we consider the problem of routing autonomously guided robots by manipulating the cost space to induce safe trajectories in the work space. Specifically, we examine the domain of UAV traffic management in urban airspaces. Each robot does not explicitly coordinate with other vehicles in the airspace. Instead, the robots execute their own individual internal cost-based planner to travel between locations. Given this structure, our goal is to develop a high-level UAV traffic management (UTM) system that can dynamically adapt the cost space to reduce the number of conflict incidents in the airspace without knowing the internal planners of each robot. We propose a decentralized and distributed system of high-level traffic controllers that each learn appropriate costing strategies via a neuro-evolutionary algorithm. The policies learned by our algorithm demonstrated a 16. 4% reduction in the total number of conflict incidents experienced in the airspace while maintaining throughput performance.

IROS Conference 2014 Conference Paper

Flop and roll: Learning robust goal-directed locomotion for a Tensegrity Robot

  • Atil Iscen
  • Adrian K. Agogino
  • Vytas SunSpiral
  • Kagan Tumer

Tensegrity robots are composed of compression elements (rods) that are connected via a network of tension elements (cables). Tensegrity robots provide many advantages over standard robots, such as compliance, robustness, and flexibility. Moreover, sphere-shaped tensegrity robots can provide non-traditional modes of locomotion, such as rolling. While they have advantageous physical properties, tensegrity robots are hard to control because of their nonlinear dynamics and oscillatory nature. In this paper, we present a robust, distributed, and directional rolling algorithm, “flop and roll”. The algorithm uses coevolution and exploits the distributed nature and symmetry of the tensegrity structure. We validate this algorithm using the NASA Tensegrity Robotics Toolkit (NTRT) simulator, as well as the highly accurate model of the physical SUPERBall being developped under the NASA Innovative and Advanced Concepts (NIAC) program. Flop and roll improves upon previous approaches in that it provides rolling to a desired location. It is also robust to both unexpected external forces and partial hardware failures. Additionally, it handles variable terrain (hills up to 33% grade). Finally, results are compatible with the hardware since the algorithm relies on realistic sensing and actuation capabilities of the SUPERBall.

AAMAS Conference 2013 Conference Paper

Addressing Hard Constraints in the Air Traffic Problem through Partitioning and Difference Rewards

  • William Curran
  • Adrian Agogino
  • Kagan Tumer

In the US alone, weather hazards and airport congestion cause thousands of hours of delay, costing billions of dollars annually. The task of managing delay may be modeled as a multiagent congestion problem with tightly coupled agents who collectively impact the system. Reward shaping has been effective at reducing noise caused by agent interaction and improving learning in soft constraint problems. We extend those results to hard constraints that cannot be easily learned, and must be algorithmically enforced. We present an agent partitioning algorithm in conjunction with reward shaping to simplify the learning domain. Our results show that a partitioning of the agents using system features leads to up to a 1000x speed up over the straight reward shaping approach, as well as up to a 30% improvement in performance over a greedy scheduling solution, corresponding to hundreds of hours of delay saved in a single day.

AAMAS Conference 2013 Conference Paper

CLEAN Rewards for Improving Multiagent Coordination in the Presence of Exploration

  • Chris HolmesParker
  • Adrian Agogino
  • Kagan Tumer

In cooperative multiagent systems, coordinating the jointactions of agents is difficult. One of the fundamental difficulties in such multiagent systems is the slow learning process where an agent may not only need to learn how to behave in a complex environment, but may also need to account for the actions of the other learning agents. Here, the inability of agents to distinguish the true environmental dynamics from those caused by the stochastic exploratory actions of other agents creates noise on each agent’s reward signal. To address this, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards, which are agent-specific shaped rewards that effectively remove such learning noise from each agent’s reward signal. We demonstrate their performance with up to 1000 agents in a standard congestion problem.

AAMAS Conference 2013 Conference Paper

Decentralized Coordination via Task Decomposition and Reward Shaping

  • Atil Iscen
  • Kagan Tumer

In this work, we introduce a method for decentralized coordination in cooperative multiagent multi-task problems where the subtasks and agents are homogeneous. Using the method proposed, the agents cooperate at the high level task selection using the knowledge they gather by learning subtasks. We introduce a subtask selection method for single agent multi-task MDPs and we extend the work to multiagent multi-task MDPs by using reward shaping at the subtask level to coordinate the agents. Our results on a multi-rover problem show that agents which use the combination of task decomposition and subtask based difference rewards result in significant improvement both in terms of learning speed, and converged policies.

AAMAS Conference 2013 Conference Paper

Exploiting Structure and Utilizing Agent-Centric Rewards to Promote Coordination in Large Multiagent Systems

  • Chris HolmesParker
  • Adrian Agogino
  • Kagan Tumer

A goal within the field of multiagent systems is to achieve scaling to large systems involving hundreds or thousands of agents. In such systems the communication requirements for agents as well as the individual agents’ ability to make decisions both play critical roles in performance. We take an incremental step towards improving scalability in such systems by introducing a novel algorithm that conglomerates three well-known existing techniques to address both agent communication requirements as well as decision making within large multiagent systems. In particular, we couple a Factored-Action Factored Markov Decision Process (FA-FMDP) framework which exploits problem structure and establishes localized rewards for agents (reducing communication requirements) with reinforcement learning using agent-centric difference rewards which addresses agent decision making and promotes coordination by addressing the structural credit assignment problem. We demonstrate our algorithms performance compared to two other popular reward techniques (global, local) with up to 10, 000 agents.

AAMAS Conference 2013 Conference Paper

Graphical Models in Continuous Domains for Multiagent Reinforcement Learning

  • Scott Proper
  • Kagan Tumer

In this paper we test two coordination methods – difference rewards and coordination graphs – in a continuous, multiagent rover domain using reinforcement learning, and discuss the situations in which each of these methods perform better alone or together, and why. We also contribute a novel method of applying coordination graphs in a continuous domain by taking advantage of the wire-fitting approach used to handle continuous state and action spaces.

AAMAS Conference 2013 Conference Paper

Learning to Control Complex Tensegrity Robots

  • Atil Iscen (Oregon State University, USA)
  • Adrian Agogino
  • Vytas Sun Spiral
  • Kagan Tumer

Tensegrity robots are based on the idea of tensegrity structures that provides many advantages critical to robotics such as being lightweight and impact tolerant. Unfortunately tensegrity robots are hard to control due to overall complexity. We use multiagent learning to learn controls of a ball-shaped tensegrity with 6 rods and 24 cables. Our simulation results show that multiagent learning can be used to learn an efficient rolling behavior and test its robustness to actuation noise.

AAAI Conference 2013 Conference Paper

Multiagent Learning with a Noisy Global Reward Signal

  • Scott Proper
  • Kagan Tumer

Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy or difficult to analyze. This makes deriving a learnable local reward signal very difficult. Difference rewards (a particular instance of reward shaping) have been used to alleviate this concern, but they remain difficult to compute in many domains. In this paper we present an approach to modeling the global reward using function approximation that allows the quick computation of local rewards. We demonstrate how this model can result in significant improvements in behavior for three congestion problems: a multiagent “bar problem”, a complex simulation of the United States airspace, and a generic air traffic domain. We show how the model of the global reward may be either learned onor off-line using either linear functions or neural networks. For the bar problem, we show an increase in reward of nearly 200% over learning using the global reward directly. For the air traffic problem, we show a decrease in costs of 25% over learning using the global reward directly.

JAAMAS Journal 2012 Journal Article

Coordinating actions in congestion games: impact of top–down and bottom–up utilities

  • Kagan Tumer
  • Scott Proper

Abstract Congestion games offer a perfect environment in which to study the impact of local decisions on global utilities in multiagent systems. What is particularly interesting in such problems is that no individual action is intrinsically “good” or “bad” but that combinations of actions lead to desirable or undesirable outcomes. As a consequence, agents need to learn how to coordinate their actions with those of other agents, rather than learn a particular set of “good” actions. A congestion game can be studied from two different perspectives: (i) from the top down, where a global utility (e. g. , a system-centric view of congestion) specifies the task to be achieved; or (ii) from the bottom up, where each agent has its own intrinsic utility it wants to maximize. In many cases, these two approaches are at odds with one another, where agents aiming to maximize their intrinsic utilities lead to poor values of a system level utility. In this paper we extend results on difference utilities, a form of shaped utility that enables multiagent learning in congested, noisy conditions, to study the global behavior that arises from the agents’ choices in two types of congestion games. Our key result is that agents that aim to maximize a modified version of their own intrinsic utilities not only perform well in terms of the global utility, but also, on average perform better with respect to their own original utilities. In addition, we show that difference utilities are robust to agents “defecting” and using their own intrinsic utilities, and that performance degrades gracefully with the number of defectors.

AAMAS Conference 2012 Conference Paper

Modeling Difference Rewards for Multiagent Learning

  • Scott Proper
  • Kagan Tumer

Difference rewards (a particular instance of reward shaping) have been used to allow multiagent domains to scale to large numbers of agents, but they remain difficult to compute in many domains. We present an approach to modeling the global reward using function approximation that allows the quick computation of shaped difference rewards. We demonstrate how this model can result in significant improvements in behavior for two air traffic control problems. We show how the model of the global reward may be either learned on- or off-line using a linear combination of neural networks.

AAMAS Conference 2012 Conference Paper

Shaping Fitness Functions for Coevolving Cooperative Multiagent Systems

  • Mitchell Colby
  • Kagan Tumer

Coevolution is a natural approach to evolve teams of agents which must cooperate to achieve some system objective. However, in many coevolutionary approaches, credit assignment is often subjective and context dependent, as the fitness of an individual agent strongly depends on the actions of the agents with which it collaborates. In order to alleviate this problem, we introduce a cooperative coevolutionary algorithm which biases the evolutionary search as well as shapes agent fitness functions to reward behavior that benefits the system. More specifically, we bias the search using a hall of fame approximation of optimal collaborators, and we shape the agent fitness using the difference objective functions. Our results show that shaping agent fitness with the difference objective improves system performance by up to 50%, and adding an additional fitness bias can improve performance by up to 75%.

JAAMAS Journal 2010 Journal Article

A multiagent approach to managing air traffic flow

  • Adrian K. Agogino
  • Kagan Tumer

Abstract Intelligent air traffic flow management is one of the fundamental challenges facing the Federal Aviation Administration (FAA) today. FAA estimates put weather, routing decisions and airport condition induced delays at 1, 682, 700 h in 2007 (FAA OPSNET Data, US Department of Transportation website, http: //www. faa. gov/data_statistics/ ), resulting in a staggering economic loss of over $41 billion (Joint Economic Commission Majority Staff, Your flight has been delayed again, 2008). New solutions to the flow management are needed to accommodate the threefold increase in air traffic anticipated over the next two decades. Indeed, this is a complex problem where the interactions of changing conditions (e. g. , weather), conflicting priorities (e. g. , different airlines), limited resources (e. g. , air traffic controllers) and heavy volume (e. g. , over 40, 000 flights over the US airspace) demand an adaptive and robust solution. In this paper we explore a multiagent algorithm where agents use reinforcement learning (RL) to reduce congestion through local actions. Each agent is associated with a fix (a specific location in 2D space) and has one of three actions: setting separation between airplanes, ordering ground delays or performing reroutes. We simulate air traffic using FACET which is an air traffic flow simulator developed at NASA and used extensively by the FAA and industry. Our FACET simulations on both artificial and real historical data from the Chicago and New York airspaces show that agents receiving personalized rewards reduce congestion by up to 80% over agents receiving a global reward and by up to 90% over a current industry approach (Monte Carlo estimation).

AAMAS Conference 2010 Conference Paper

Robot Coordination with Ad-hoc Team Formation

  • Matt Knudson
  • Kagan Tumer

Coordinating multiagent systems to maximize global information collection both presents scientific challenges (whatshould each agent aim to achieve? ) and provides application opportunities (planetary exploration, search and rescue). In particular, in many domains where communicationis expensive (for example, because of limited power or computation), the coordination must be achieved in a passivemanner, without agents explicitly informing other agents oftheir states and/or intended actions. In this work, we extend results on such multiagent coordination algorithms todomains where the agents cannot achieve the required taskswithout forming teams.

IS Journal 2009 Journal Article

Improving Air Traffic Management with a Learning Multiagent System

  • Kagan Tumer
  • Adrian Agogino

A fundamental challenge facing the aerospace industry is efficient, safe, and reliable air traffic management (ATM). On a typical day, more than 40, 000 commercial flights operate in US airspace, and the number of flights is increasing rapidly. This paper shows how learning multiagent system helps improve ATM.

AAAI Conference 2008 Conference Paper

Adaptive Management of Air Traffic Flow: A Multiagent Coordination Approach

  • Kagan Tumer

This paper summarizes recent advances in the application of multiagent coordination algorithms to air traffic flow management. Indeed, air traffic flow management is one of the fundamental challenges facing the Federal Aviation Administration (FAA) today. This problem is particularly complex as it requires the integration and/or coordination of many factors including: new data (e. g. , changing weather info), potentially conflicting priorities (e. g. , different airlines), limited resources (e. g. , air traffic controllers) and very heavy traffic volume (e. g. , over 40, 000 flights over the US airspace). The multiagent approach assigns an agent to a navigational fix (a specific location in 2D space) and uses three separate actions to control the airspace: setting the separation between airplanes, setting ground holds that delay aircraft departures and rerouting aircraft. Agents then use reinforcement learning to learn the best set of actions. Results based on FACET (a commercial simulator) show that agents receiving personalized rewards reduce congestion by up to 80% over agents receiving a global reward and by up to 85% over a current industry approach (Monte Carlo estimation). These results show that with proper selection of agents, their actions and their reward structures, multiagent coordination algorithms can be successfully applied to complex real world domains.

AAMAS Conference 2008 Conference Paper

Aligning social welfare and agent preferences to alleviate traffic congestion

  • Kagan Tumer
  • Zach Welch
  • Adrian Agogino

Multiagent coordination algorithms provide unique insights into the challenging problem of alleviating traffic congestion. What is particularly interesting in this class of problem is that no individual action (e. g. , leave at a given time) is intrinsically “bad” but that combinations of actions among agents lead to undesirable outcomes. As a consequence, agents need to learn how to coordinate their actions with those of other agents, rather than learn a particular set of ”good” actions. In general, the traffic problem can be approached from two distinct perspectives: (i) from a city manager’s point of view, where the aim is to optimize a city wide objective function (e. g. , minimize total city wide delays), and (ii) from the individual driver’s point of view, where each driver is aiming to optimize a personal objective function (e. g. , a“timeliness”function that minimizes the difference desired and actual arrival times at a destination). In many cases, these two objective functions are at odds with one another, where drivers aiming to optimize their own objectives yield to congestion and poor values of city objective functions. In this paper we present an objective shaping approach to both types of problems and study the system behavior that arises from the drivers’ choices. We first show a topdown approach that provides incentives to drivers and leads to good values of the city manager’s objective function. We then present a bottom-up approach that shows that drivers aiming to optimize their own personal timeliness objective lead to poor performance with respect to a city manager’s objective function. Finally, we present the intriguing result that drivers that aim to optimize a modified version of their own timeliness function not only perform well in terms of the city manager’s objective function, but also perform better with respect to their own original timeliness functions.

JAAMAS Journal 2008 Journal Article

Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

  • Adrian K. Agogino
  • Kagan Tumer

Abstract The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to good system behavior (i. e. , properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic, stochastic domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning where the effectiveness of the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents’ reward structure. We use this reward property visualization method to determine an effective reward without performing extensive simulations. We then test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e. g. , the agents’ movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational limitations of the domain, providing rewards that combine the best properties of traditional rewards.

AAMAS Conference 2008 Conference Paper

Regulating Air Traffic Flow with Coupled Agents

  • Adrian Agogino
  • Kagan Tumer

The ability to provide flexible, automated management of air traffic is critical to meeting the ever increasing needs of the next generation air transportation systems. This problem is particularly complex as it requires the integration of many factors including, updated information (e. g. , changing weather info), conflicting priorities (e. g. , different airlines), limited resources (e. g. , air traffic controllers) and very heavy traffic volume (e. g. , over 40, 000 daily flights over the US airspace). Furthermore, because the Federal Flight Administration will not accept black-box solutions, algorithmic improvements need to be consistent with current operating practices and provide explanations for each new decision. Unfortunately current methods provide neither flexibility for future upgrades, nor high enough performance in complex coupled air traffic flow problems. This paper extends agent-based methods for controlling air traffic flow to more realistic domains that have coupled flow patterns and need to be controlled through a variety of mechanisms. First, we explore an agent control structure that allows agents to control air traffic flow through one of three mechanisms (miles in trail, ground delays and rerouting). Second, we explore a new agent learning algorithm that can efficiently handle coupled flow patterns. We then test this agent solution on a series of congestion problems, showing that it is flexible enough to achieve high performance with different control mechanisms. In addition the results show that the new solution is able to achieve up to a 20% increase in performance over previous methods that did not account for the agent coupling.

AAMAS Conference 2007 Conference Paper

Distributed Agent-Based Air Traffic Flow Management

  • Kagan Tumer
  • Adrian Agogino

Air traffic flow management is one of the fundamental challenges facing the Federal Aviation Administration (FAA) today. The FAA estimates that in 2005 alone, there were over 322, 000 hours of delays at a cost to the industry in excess of three billion dollars. Finding reliable and adaptive solutions to the flow management problem is of paramount importance if the Next Generation Air Transportation Systems are to achieve the stated goal of accommodating three times the current traffic volume. This problem is particularly complex as it requires the integration and/or coordination of many factors including: new data (e. g. , changing weather info), potentially conflicting priorities (e. g. , different airlines), limited resources (e. g. , air traffic controllers) and very heavy traffic volume (e. g. , over 40, 000 flights over the US airspace).

JAAMAS Journal 2006 Journal Article

Handling Communication Restrictions and Team Formation in Congestion Games

  • Adrian K. Agogino
  • Kagan Tumer

Abstract There are many domains in which a multi-agent system needs to maximize a “system utility” function which rates the performance of the entire system, while subject to communication restrictions among the agents. Such communication restrictions make it difficult for agents that take actions to optimize their own “private” utilities to also help optimize the system utility. In this article we show how previously introduced utilities that promote coordination among agents can be modified to be effective in domains with communication restrictions. The modified utilities provide performance improvements of up to 75 over previously used utilities in congestion games (i. e. , games where the system utility depends solely on the number of agents choosing a particular action). In addition, we show that in the presence of severe communication restrictions, team formation for the purpose of information sharing among agents leads to an additional 25 improvement in system utility. Finally, we show that agents’ private utilities and team sizes can be manipulated to form the best compromise between how “aligned” an agent’s utility is with the system utility and how easily an agent can learn that utility.

NeurIPS Conference 1998 Conference Paper

Using Collective Intelligence to Route Internet Traffic

  • David Wolpert
  • Kagan Tumer
  • Jeremy Frank

A COllective INtelligence (COIN) is a set of interacting reinforce(cid: 173) ment learning (RL) algorithms designed in an automated fashion so that their collective behavior optimizes a global utility function. We summarize the theory of COINs, then present experiments us(cid: 173) ing that theory to design COINs to control internet traffic routing. These experiments indicate that COINs outperform all previously investigated RL-based, shortest path routing algorithms.

NeurIPS Conference 1996 Conference Paper

Spectroscopic Detection of Cervical Pre-Cancer through Radial Basis Function Networks

  • Kagan Tumer
  • Nirmala Ramanujam
  • Rebecca Richards-Kortum
  • Joydeep Ghosh

The mortality related to cervical cancer can be substantially re(cid: 173) duced through early detection and treatment. However, cur(cid: 173) rent detection techniques, such as Pap smear and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo fluorescence spectroscopy is a technique which quickly, non(cid: 173) invasively and quantitatively probes the biochemical and morpho(cid: 173) logical changes that occur in pre-cancerous tissue. RBF ensemble algorithms based on such spectra provide automated, and near real(cid: 173) time implementation of pre-cancer detection in the hands of non(cid: 173) experts. The results are more reliable, direct and accurate than those achieved by either human experts or multivariate statistical algorithms.