Author name cluster

Kagan Tumer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

56 papers

2 author rows

ECAI Conference 2025 Conference Paper

Multiagent Quality-Diversity for Effective Adaptation

Siddarth Iyer
Ayhan Alp Aydeniz
Gaurav Dixit
Kagan Tumer

Robust adaptation in multiagent settings requires learning not just a single optimal behavior, but a repertoire of high-performing and diverse team behaviors that can succeed under environmental contingencies. Traditional multiagent reinforcement learning methods typically converge to a single specialized team behavior, limiting their adaptability. Recent approaches like Mix-ME promote behavioral diversity but rely solely on evolutionary operators, often resulting in sample-inefficiency and uncoordinated team composition. This work introduces Multiagent Sample-Efficient Quality-Diversity (MASQD), a learning framework that produces an archive of diverse, high-performing multiagent teams. MASQD builds on the Cross-Entropy Method Reinforcement Learning algorithm and extends it to the multiagent setting by representing teams as parameter-shared neural networks, directing exploration from previously discovered behaviors, and guiding refinement through a descriptor-conditioned critic. Through this coupling of anchored exploration and targeted exploitation, MASQD produces functional diversity: teams that are not only behaviorally distinct but also robust and effective under varied conditions. Experiments across four Multiagent MuJoCo tasks show that MASQD outperforms state-of-the-art baselines in both team fitness and functional diversity.

Details

AAMAS Conference 2025 Conference Paper

Safe Entropic Agents under Team Constraints

Ayhan Alp Aydeniz
Enrico Marchesini
Robert Loftin
Christopher Amato
Kagan Tumer

Safety is a critical concern in multiagent reinforcement learning (MARL), yet typical safety-aware methods constrain agent behaviors, limiting exploration—essential for discovering e�ective cooperation. Existing approaches mainly enforce individual constraints, overlooking potential bene�ts of joint (team) constraints. We analyze team constraints theoretically and practically, introducing entropic exploration for constrained MARL (E2C). E2C maximizes observation entropy to encourage exploration while ensuring safety at the individual and team levels. Experiments across diverse domains demonstrate that E2C matches or outperforms common baselines in task performance while reducing unsafe behaviors by up to 50%.

PDF

AAMAS Conference 2024 Conference Paper

Entropy Seeking Constrained Multiagent Reinforcement Learning

Ayhan Alp Aydeniz
Enrico Marchesini
Christopher Amato
Kagan Tumer

Multiagent Reinforcement Learning (MARL) has been successfully applied to domains requiring close coordination among many agents. However, real-world tasks require safety specifications that are not generally considered by MARL algorithms. In this work, we introduce an Entropy Seeking Constrained (ESC) approach aiming to learn safe cooperative policies for multiagent systems. Unlike previous methods, ESC considers safety specifications while maximizing state-visitation entropy, addressing the exploration issues of constrained-based solutions.

PDF

AAMAS Conference 2024 Conference Paper

Indirect Credit Assignment in a Multiagent System

Everardo Gonzalez
Siddarth Viswanathan
Kagan Tumer

Learning in a multiagent system requires structural credit assignment to distill system performance into agent-specific feedback. Fitness shaping methods largely isolate agent credit, but struggle when an agent’s actions do not directly affect system feedback. This work introduces D-Indirect, a fitness shaping method that gives credit for both direct actions and actions that have an indirect impact on the system’s performance. We demonstrate the effectiveness of D-Indirect in a simulated shepherding scenario and our results show that learning with D-Indirect significantly outperforms learning with the standard difference evaluation and the system evaluation when agents indirectly impact system performance.

PDF

AAMAS Conference 2024 Conference Paper

Influence-Focused Asymmetric Island Model

Andrew Festa
Gaurav Dixit
Kagan Tumer

Learning good joint-behaviors is challenging in multiagent settings due to the inherent non-stationarity: agents adapt their policies and act simultaneously. This is aggravated when the agents are asymmetric (agents have distinct capabilities and objectives) and must learn complementary behaviors required to work as a team. The Asymmetric Island Model partially addresses this by independently optimizing class-specific and team-wide behaviors. However, optimizing class-specific behaviors in isolation can produce egocentric behaviors that yield sub-optimal inter-class behaviors. This work introduces the Influence-Focused Asymmetric Island model (IF- AIM), a hierarchical framework that explicitly reinforces inter-class behaviors by optimizing class-specific behaviors conditioned on the expectation of behaviors of the complementary agent classes. An experiment in the harvest environment highlights the effectiveness of our method in optimizing adaptable inter-class behaviors.

PDF

ECAI Conference 2024 Conference Paper

Objective-Informed Diversity for Multi-Objective Multiagent Coordination

Gaurav Dixit
Kagan Tumer

To coordinate in multiagent settings characterized by multiple objectives, asymmetric agents (agents with distinct capabilities and preferences) must learn diverse behaviors to balance trade-offs between agent-specific and team objectives. Hierarchical methods partially address this by leveraging a combination of Quality-Diversity methods that illuminate the behavior space and evolutionary algorithms that use non-dominated sorting over the explored behaviors to improve coverage in the objective space. However, optimizing diverse behaviors and trade-offs in isolation is susceptible to producing egocentric behaviors that favor agent-specific objectives at the cost of team objectives. This work introduces the Multi-Objective Informed Island Model (MOI-IM), an asymmetric multiagent learning framework that fosters diverse behaviors and rich inter-agent relationships, necessary to balance potentially conflicting and misaligned objectives. An evolutionary algorithm improves coverage in the objective space by evolving a population of teams, while a gradient-based optimization infers and progressively explores the behavior space by fluidly adapting search to regions that produce policies with non-dominated trade-offs. The two processes are coupled via shared replay buffers to ensure alignment between coverage in the behavior and objective space. Empirical results on an asymmetric multi-objective coordination problem highlight MOI-IM’s ability to produce teams that can express diverse trade-offs and robust relationships required to balance misaligned objectives.

Details

ICRA Conference 2023 Conference Paper

Contextual Multi-Objective Path Planning

Anna Nickelson
Kagan Tumer
William D. Smart

Many critical robot environments, such as healthcare and security, require robots to account for contextdependent criteria when performing their functions (e. g. , navigation). Such domains require decisions that balance multiple factors, making it difficult for robots to make contextually appropriate decisions. Multi-Objective Optimization (MOO) methods offer a potential solution by trading off between objectives; however concepts like Pareto fronts are not only expensive to compute but struggle with differentiating among solutions on the Pareto front. This work introduces the Contextual Multi-Objective Path Planning (CMOPP) algorithm, which enables the robot to trade off different complex costs dependent on context. The key insight of this work is to separate the path planning and path cost estimation into two independent steps, thus significantly reducing computation cost without impacting the quality of the resulting path. As a result, CMOPP is able to accurately model path costs, which provide meaningful trade-offs when choosing a path that best fits the context. We show the benefits of CMOPP on case studies that demonstrate its contextual path planning capabilities. CMOPP finds contextually appropriate paths by first reducing the search space up to 99. 9% to a near-optimal set of paths. This reduction enables the generation of accurate path cost models, using up to 90% less computation than similar methods.

Details

ECAI Conference 2023 Conference Paper

Knowledge Injection for Multiagent Systems via Counterfactual Perception Shaping

Nicholas Zerbel
Kagan Tumer

Reward shaping can be used to train coordinated agent teams, but most learning approaches optimize for training conditions and by design, are limited by knowledge directly captured by the reward function. Advances in adaptive systems (e. g. , transfer learning) may enable agents to quickly learn new policies in response to changing conditions, but retraining agents is both difficult and risks losing team coordination altogether. In this work we introduce Counterfactual Knowledge Injection (CKI), a novel approach to injecting high-level information into a multiagent system outside of the learning process. CKI encodes knowledge into counterfactual state representations to shape agent perceptions of the system so that their current policies better match the current system conditions. We demonstrate CKI in a multiagent exploration task where agents must collaborate to observe various Points of Interest (POI). We show that CKI successfully imparts high-level system knowledge to agents in response to imperceptible changes. We also show that CKI enables agents to adjust their level of agent-to-agent coordination ranging from tasks individuals can complete up to tasks that require the entire team.

Details

AAMAS Conference 2023 Conference Paper

Learning Inter-Agent Synergies in Asymmetric Multiagent Systems

Gaurav Dixit
Kagan Tumer

In multiagent systems that require coordination, agents must learn diverse policies that enable them to achieve their individual and team objectives. Multiagent Quality-Diversity methods partially address this problem by filtering the joint space of policies to smaller sub-spaces that make the diversification of agent policies tractable. However, in teams of asymmetric agents (agents with different objectives and capabilities), the search for diversity is primarily driven by the need to find policies that will allow agents to assume complementary roles required to work together in teams. This work introduces Asymmetric Island Model (AIM), a multiagent framework that enables populations of asymmetric agents to learn diverse complementary policies that foster teamwork via dynamic population size allocation on a wide variety of team tasks. The key insight of AIM is that the competitive pressure arising from the distribution of policies on different team-wide tasks drives the agents to explore regions of the policy space that yield specializations that generalize across tasks. Simulation results on multiple variations of a remote habitat problem highlight the strength of AIM in discovering robust synergies that allow agents to operate near-optimally in response to the changing team composition and policies of other agents.

PDF

AAMAS Conference 2023 Conference Paper

Multi-Team Fitness Critics For Robust Teaming

Joshua Cook
Tristan Scheiner
Kagan Tumer

Many multiagent systems, such as search and rescue or underwater exploration, rely on generalizable teamwork abilities to achieve complex tasks. Though many ad-hoc teaming algorithms focus on finding an agent’s best fit with static team members, domains with high degrees of uncertainty and dynamic teammates require an agent to cooperate with arbitrary teams. Prior work views this as an issue of uninformative rewards, providing high-quality but potentially expensive evaluation methods to isolate an agent’s contribution. In this work, we provide a local-evaluation-based approach that leverages state trajectories of agents to better identify their impact across multiple teams. The key insight that enables this approach is that agent trajectories and previous experiences carry sufficient information to map agent abilities to team performance. As a result, we are able to train multiple agents to cooperate across arbitrary teams as well as, if not better than, current methods, while only using local information and significantly fewer team evaluations.

PDF

AAMAS Conference 2022 Conference Paper

Behavior Exploration and Team Balancing for Heterogeneous Multiagent Coordination

Gaurav Dixit
Kagan Tumer

Diversity in behaviors is instrumental for robust team performance in many multiagent tasks which require agents to coordinate. Unfortunately, exhaustive search through the agents’ behavior spaces is often intractable. This paper introduces Behavior Exploration for Heterogeneous Teams (BEHT), a multi-level learning framework that enables agents to progressively explore regions of the behavior space that promote team coordination on diverse goals. By combining diversity search to maximize agent-specific rewards and evolutionary optimization to maximize the team-based fitness, our method effectively filters regions of the behavior space that are conducive to agent coordination. We demonstrate the diverse behaviors and synergies that are method allows agents to learn on a multiagent exploration problem.

PDF

AAMAS Conference 2021 Conference Paper

Dynamic Skill Selection for Learning Joint Actions

Enna Sachdeva
Shauharda Khadka
Somdeb Majumdar
Kagan Tumer

Learning in tightly coupled multiagent settings with sparse rewards is challenging because multiple agents must reach the goal state simultaneously for the team to receive a reward. This is even more challenging under temporal coupling constraints - where agents need to sequentially complete different components of a task in a particular order. Here, a single local reward is inadequate for learning an effective policy. We introduce MADyS, Multiagent Learning via Dynamic Skill Selection, a bi-level optimization framework that learns to dynamically switch between multiple local skills to optimize sparse team objectives. MADyS adopts fast policy gradients to learn local skills using local rewards and an evolutionary algorithm to optimize the sparse team objective by recruiting the most optimal skill at any given time. This eliminates the need to generate a single dense reward via reward shaping or other mixing functions. In environments with both spatial and temporal coupling requirements, we outperform prior methods and provides intuitive visualizations of its skill switching strategy.

PDF

ICML Conference 2020 Conference Paper

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

Somdeb Majumdar
Shauharda Khadka
Santiago Miret
Stephen Marcus McAleer
Kagan Tumer

Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Also, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams. Concurrently, a gradient-based optimizer trains policies to only maximize the dense agent-specific rewards. The gradient-based policies are periodically added to the evolutionary population as a way of information transfer between the two optimization processes. This enables the evolutionary algorithm to use skills learned via the agent-specific rewards toward optimizing the global objective. Results demonstrate that MERL significantly outperforms state-of-the-art methods, such as MADDPG, on a number of difficult coordination benchmarks.

Details

ICML Conference 2019 Conference Paper

Collaborative Evolutionary Reinforcement Learning

Shauharda Khadka
Somdeb Majumdar
Tarek Nassar
Zach Dwiel
Evren Tumer
Santiago Miret
Yinyin Liu
Kagan Tumer

Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically struggle with achieving effective exploration and are extremely sensitive to the choice of hyperparameters. One reason is that most approaches use a noisy version of their operating policy to explore - thereby limiting the range of exploration. In this paper, we introduce Collaborative Evolutionary Reinforcement Learning (CERL), a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space. A collection of learners - typically proven algorithms like TD3 - optimize over varying time-horizons leading to this diverse portfolio. All learners contribute to and use a shared replay buffer to achieve greater sample efficiency. Computational resources are dynamically distributed to favor the best learners as a form of online algorithm selection. Neuroevolution binds this entire process to generate a single emergent learner that exceeds the capabilities of any individual learner. Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient - notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation.

Details

AAMAS Conference 2019 Conference Paper

Curriculum Learning for Tightly Coupled Multiagent Systems

Golden Rockefeller
Patrick Mannion
Kagan Tumer

In this paper, we leverage curriculum learning (CL) to improve the performance of multiagent systems (MAS) that are trained with the cooperative coevolution of artificial neural networks. We design curricula to progressively change two dimensions: scale (i. e. domain size) and coupling (i. e. the number of agents required to complete a subtask). We demonstrate that CL can successfully mitigate the challenge of learning on a sparse reward signal resulting from a high degree of coupling in complex MAS. We also show that, in most cases, the combination of difference reward shaping with CL can improve performance by up to 56%. We evaluate our CL methods on the tightly coupled multi-rover domain. CL increased converged system performance on all tasks presented. Furthermore, agents were only able to learn when trained with CL for most tasks.

PDF

AAMAS Conference 2019 Conference Paper

Memory based Multiagent One Shot Learning

Shauharda Khadka
Connor Yates
Kagan Tumer

One shot learning is particularly difficult in multiagent systems where the relevant information is distributed across agents, and inter-agent interactions shape global emergent behavior. This paper introduces a distributed learning framework called Distributed Modular Memory Unit (DMMU) that creates a shared external memory to enable one shot adaptive learning in multiagent systems. In DMMU, a shared external memory is selectively accessed by agents acting asynchronously and in parallel. Each agent processes its own stream of sequential information independently while interacting with the shared external memory to identify, retain, and propagate salient information. This enables DMMU to rapidly assimilate task features from a group of distributed agents, consolidate it into a reconfigurable external memory, and use it for one shot multiagent learning. We compare the performance of the DMMU framework on a simulated cybersecurity task with traditional feedforward ensembles, LSTM based agents, and a centralized framework. Results demonstrate that DMMU significantly outperforms the other methods and exhibits distributed one shot learning.

PDF

AAMAS Conference 2019 Conference Paper

The Impact of Agent Definitions and Interactions on Multiagent Learning for Coordination

Jen Jen Chung
Damjan Miklić
Lorenzo Sabattini
Kagan Tumer
Roland Siegwart

The state-action space of an individual agent in a multiagent team fundamentally dictates how the individual interacts with the rest of the team. Thus, how an agent is defined in the context of its domain has a significant effect on team performance when learning to coordinate. In this work we explore the trade-offs associated with these design choices, for example, having fewer agents in the team that individually are able to process and act on a wider scope of information about the world versus a larger team of agents where each agent observes and acts in a more local region of the domain. We focus our study on a traffic management domain and highlight the trends in learning performance when applying different agent definitions.

PDF

AAMAS Conference 2018 Conference Paper

A Memory-based Multiagent Framework for Adaptive Decision Making

Shauharda Khadka
Connor Yates
Kagan Tumer

Rapid adaptation to dynamically change one’s policy based on a singular observation is a complex problem. This is especially difficult in multiagent systems where the global behavior emerges from inter-agent interactions. In this paper, we introduce a memorybased learning framework called Distributed Modular Memory Unit (DMMU) which enables rapid and adaptive decision making. In DMMU, a shared external memory is selectively accessed by agents acting independently and in parallel. Each agent processes its own stream of sequential information independently while interacting with the shared external memory to identify, retain, and propagate salient information. This enables DMMU to rapidly assimilate task features from a group of distributed agents, consolidate it into a reconfigurable external memory, and use it for one-shot multiagent learning. We compare the performance of the DMMU framework on a simulated cybersecurity task with traditional feedforward ensembles, LSTM based agents, and a centralized framework. Results demonstrate that DMMU significantly outperforms the best LSTM based method by a factor of two and exhibits adaptive decision making to effectively solve this complex task.

PDF

NeurIPS Conference 2018 Conference Paper

Evolution-Guided Policy Gradient in Reinforcement Learning

Shauharda Khadka
Kagan Tumer

Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack of effective exploration, and brittle convergence properties that are extremely sensitive to hyperparameters. Collectively, these challenges severely limit the applicability of these approaches to real world problems. Evolutionary Algorithms (EAs), a class of black box optimization techniques inspired by natural evolution, are well suited to address each of these three challenges. However, EAs typically suffer from high sample complexity and struggle to solve problems that require optimization of a large number of parameters. In this paper, we introduce Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into the EA population periodically to inject gradient information into the EA. ERL inherits EA's ability of temporal credit assignment with a fitness metric, effective exploration with a diverse set of policies, and stability of a population-based approach and complements it with off-policy DRL's ability to leverage gradients for higher sample efficiency and faster learning. Experiments in a range of challenging continuous control benchmarks demonstrate that ERL significantly outperforms prior DRL and EA methods.

PDF Details

AAMAS Conference 2018 Conference Paper

When Less is More: Reducing Agent Noise with Probabilistically Learning Agents

Jen Jen Chung
Scott Chow
Kagan Tumer

Distributed agents concurrently learning to coordinate in a multiagent system can suffer from considerable amounts of agent noise. This is the noise that arises from the non-stationarity of the learning environment for each individual agent since other agents in the system are also constantly updating their policies, thereby continually shifting the goal posts for successful coordination. In this work, we propose a method to reduce agent noise by allowing individual agents to probabilistically determine whether or not to undergo policy updates. We show that using this method to adapt the number of actively learning agents over time provides improvements in convergence speed of the team as a whole without affecting the final converged learning performance.

PDF

KER Journal 2016 Journal Article

Combining reward shaping and hierarchies for scaling to large multiagent systems

Chris HolmesParker
Adrian K. Agogino
Kagan Tumer

Abstract Coordinating the actions of agents in multiagent systems presents a challenging problem, especially as the size of the system is increased and predicting the agent interactions becomes difficult. Many approaches to improving coordination within multiagent systems have been developed including organizational structures, shaped rewards, coordination graphs, heuristic methods, and learning automata. However, each of these approaches still have inherent limitations with respect to coordination and scalability. We explore the potential of synergistically combining existing coordination mechanisms such that they offset each others’ limitations. More specifically, we are interested in combining existing coordination mechanisms in order to achieve improved performance, increased scalability, and reduced coordination complexity in large multiagent systems. In this work, we discuss and demonstrate the individual limitations of two well-known coordination mechanisms. We then provide a methodology for combining the two coordination mechanisms to offset their limitations and improve performance over either method individually. In particular, we combine shaped difference rewards and hierarchical organization in the Defect Combination Problem with up to 10 000 sensing agents. We show that combining hierarchical organization with difference rewards can improve both coordination and scalability by decreasing information overhead, structuring agent-to-agent connectivity and control flow, and improving the individual decision-making capabilities of agents. We show that by combining hierarchies and difference rewards, the information overheads and computational requirements of individual agents can be reduced by as much as 99% while simultaneously increasing the overall system performance. Additionally, we demonstrate the robustness of this approach to handling up to 25% agent failures under various conditions.

Details DOI

IROS Conference 2016 Conference Paper

D++: Structural credit assignment in tightly coupled multiagent domains

Aida Rahmattalabi
Jen Jen Chung
Mitchell K. Colby
Kagan Tumer

Autonomous multi-robot teams can be used in complex coordinated exploration tasks to improve exploration performance in terms of both speed and effectiveness. However, use of multi-robot systems presents additional challenges. Specifically, in domains where the robots' actions are tightly coupled, coordinating multiple robots to achieve cooperative behavior at the group level is difficult. In this paper, we demonstrate that reward shaping can greatly benefit learning in multi-robot exploration tasks. We propose a novel reward framework based on the idea of counterfactuals to tackle the coordination problem in tightly coupled domains. We show that the proposed algorithm provides superior performance (166% performance improvement and a quadruple convergence speed up) compared to policies learned using either the global reward or the difference reward [1].

Details

AAMAS Conference 2016 Conference Paper

Local Approximation of Difference Evaluation Functions

Mitchell Colby
Theodore Duchow-Pressley
Jen Jen Chung
Kagan Tumer

Diﬀerence evaluation functions have resulted in excellent multiagent behavior in many domains, including air traf- ﬁc and mobile robot control. However, calculating diﬀerence evaluation functions requires determining the value of a counterfactual system objective function, which is often diﬃcult when the system objective function is unknown or global state and action information is unavailable. In this work, we demonstrate that a local estimate of the system evaluation function may be used to estimate diﬀerence evaluations using readily available information, allowing for difference evaluations to be computed in multiagent systems where the mathematical form of the objective function is not known. This approximation technique is tested in two domains, and we demonstrate that approximating diﬀerence evaluation functions results in better performance and faster learning than when using global evaluation functions. Finally, we demonstrate the eﬀectiveness of the learned policies on a set of Pioneer P3-DX robots.

PDF

JAAMAS Journal 2015 Journal Article

Fitness function shaping in multiagent cooperative coevolutionary algorithms

Mitchell Colby
Kagan Tumer

Abstract Coevolution is a promising approach to evolve teams of agents which must cooperate to achieve some system objective. However, in many coevolutionary approaches, credit assignment is often subjective and context dependent, as the fitness of an individual agent strongly depends on the actions of the agents with which it collaborates. In order to alleviate this problem, we introduce a cooperative coevolutionary algorithm which biases the evolutionary search as well as shapes agent fitness functions to promote behavior that benefits the system-level performance. More specifically, we bias the search using a hall of fame approximation of optimal collaborators, and shape the agent fitness using the difference evaluation function. Our results show that shaping agent fitness with the difference evaluation improves system performance by up to 50 %, and adding an additional fitness bias improves performance by up to 75 % in our experiments. Finally, an analysis of system performance as a function of computational cost demonstrates that this algorithm makes extremely efficient use of computational resources, having a higher performance as a function of computational cost than any other algorithm tested.

Details DOI

IROS Conference 2015 Conference Paper

Implicit adaptive multi-robot coordination in dynamic environments

Mitchell K. Colby
Jen Jen Chung
Kagan Tumer

Multi-robot teams offer key advantages over single robots in exploration missions by increasing efficiency (explore larger areas), reducing risk (partial mission failure with robot failures), and enabling new data collection modes (multi-modal observations). However, coordinating multiple robots to achieve a system-level task is difficult, particularly if the task may change during the mission. In this work, we demonstrate how multiagent cooperative coevolutionary algorithms can develop successful control policies for dynamic and stochastic multi-robot exploration missions. We find that agents using difference evaluation functions (a technique that quantifies each individual agent's contribution to the team) provides superior system performance (up to 15%) compared to global evaluation functions and a hand-coded algorithm.

Details

IROS Conference 2015 Conference Paper

Learning to trick cost-based planners into cooperative behavior

Carrie Rebhuhn
Ryan Skeele
Jen Jen Chung
Geoffrey A. Hollinger
Kagan Tumer

In this paper we consider the problem of routing autonomously guided robots by manipulating the cost space to induce safe trajectories in the work space. Specifically, we examine the domain of UAV traffic management in urban airspaces. Each robot does not explicitly coordinate with other vehicles in the airspace. Instead, the robots execute their own individual internal cost-based planner to travel between locations. Given this structure, our goal is to develop a high-level UAV traffic management (UTM) system that can dynamically adapt the cost space to reduce the number of conflict incidents in the airspace without knowing the internal planners of each robot. We propose a decentralized and distributed system of high-level traffic controllers that each learn appropriate costing strategies via a neuro-evolutionary algorithm. The policies learned by our algorithm demonstrated a 16. 4% reduction in the total number of conflict incidents experienced in the airspace while maintaining throughput performance.

Details

IROS Conference 2014 Conference Paper

Flop and roll: Learning robust goal-directed locomotion for a Tensegrity Robot

Atil Iscen
Adrian K. Agogino
Vytas SunSpiral
Kagan Tumer

Tensegrity robots are composed of compression elements (rods) that are connected via a network of tension elements (cables). Tensegrity robots provide many advantages over standard robots, such as compliance, robustness, and flexibility. Moreover, sphere-shaped tensegrity robots can provide non-traditional modes of locomotion, such as rolling. While they have advantageous physical properties, tensegrity robots are hard to control because of their nonlinear dynamics and oscillatory nature. In this paper, we present a robust, distributed, and directional rolling algorithm, “flop and roll”. The algorithm uses coevolution and exploits the distributed nature and symmetry of the tensegrity structure. We validate this algorithm using the NASA Tensegrity Robotics Toolkit (NTRT) simulator, as well as the highly accurate model of the physical SUPERBall being developped under the NASA Innovative and Advanced Concepts (NIAC) program. Flop and roll improves upon previous approaches in that it provides rolling to a desired location. It is also robust to both unexpected external forces and partial hardware failures. Additionally, it handles variable terrain (hills up to 33% grade). Finally, results are compatible with the hardware since the algorithm relies on realistic sensing and actuation capabilities of the SUPERBall.

Details