Arrow Research search

Author name cluster

Prashant Doshi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

74 papers
2 author rows

Possible papers

74

AIJ Journal 2026 Journal Article

Decision-theoretic planning and cognitive modeling for active cyber deception

  • Aditya Shinde
  • Prashant Doshi

Cyber defense is evolving to include deception as a key strategy to thwart adversaries. Cyber deception elevates cyber defense by shifting the focus from intrusion detection and prevention to strategically influencing the attacker’s beliefs and perceptions. However, in its current form, deception is employed passively to mislead and misdirect adversaries using decoy systems called honeypots. We present a decision-theoretic approach to active intent recognition using honeypots. We model cyber deception as a sequential decision-making problem in a two-agent context situated on a single honeypot host. To explicitly reason about the influence of deception on the attacker’s beliefs, we introduce factored finitely-nested interactive POMDPs (I-POMDP X ), a factored variant of the I-POMDP framework. We utilize the I-POMDP X framework to model the problem with multiple candidate attacker types, each of which models a cyber attack across various stages from the attacker’s initial entry to reaching its adversarial objective. Recursive reasoning facilitated by I-POMDPs enables the defender to simulate interactions where the attacker is oblivious of a defender, and also scenarios where the attacker reasons about the defender’s actions. The defending I-POMDP X -based agent uses decoys to engage the attacker at multiple phases to form increasingly accurate predictions of the attacker’s behavior and intent. Subsequently, we leverage the explicit and subjective reasoning capability of the I-POMDP X to model cognitive biases known to play a role in deception. Specifically, we model the fundamental attribution error (FAE) and confirmation bias. We show that the cognitive modeling of these biases using the I-POMDP X framework plays a crucial role in deceiving sophisticated adversaries. We evaluate our framework in both simulations and with the I-POMDP X agent deployed on a honeypot host with instrumentation. Our experiments show that the I-POMDP X -based agent outperforms commonly used deception strategies in intent recognition on honeypots. We explore how the defender’s deception evolves as the attacker becomes more strategic. At higher levels of reasoning, we demonstrate how the defender can leverage the computational modeling of the attacker’s cognitive biases to facilitate deception against sophisticated adversaries. This emerging application of autonomous agents offers a new approach to cyber defense that contrasts with the traditional action-reaction dynamic that has defined interactions between cyber attackers and defenders for years.

ICRA Conference 2025 Conference Paper

A Novel Computational Framework of Robot Trust for Human-Robot Teams

  • Bhavana Nare
  • John Frericks
  • Anusha Challa
  • Prashant Doshi
  • Kyle Johnsen 0001

When humans collaborate, they form positive or negative experiences with each other. These experiences depend on various factors such as the individual's skills, abilities, and agency. In this paper, we consider human-robot collaborations and present a novel model of an autonomous robot's trust in humans based on the probability of the robot having a positive experience with the human. The model defines a dynamic trust-building process that translates into a computationallyaccessible implementation. We hypothesize predictors of a positive experience with human teammates and derive trust in individual humans. As the interactions continue, team members develop an affinity toward each other. The robot's affinity towards humans can be viewed as kinship, and we also investigate how kinship affects trust and distrust. We present an algorithm for how the robot may use kinship-mediated trust in its decision-making, and demonstrate its use in simulated missions truly requiring human-robot collaboration.

AIJ Journal 2025 Journal Article

Active legibility in multiagent reinforcement learning

  • Yanyu Liu
  • Yinghui Pan
  • Yifeng Zeng
  • Biyang Ma
  • Prashant Doshi

A multiagent sequential decision problem has been seen in many critical applications including urban transportation, autonomous driving cars, military operations, etc. Its widely known solution, namely multiagent reinforcement learning, has evolved tremendously in recent years. Among them, the solution paradigm of modeling other agents attracts our interest, which is different from traditional value decomposition or communication mechanisms. It enables agents to understand and anticipate others' behaviors and facilitates their collaboration. Inspired by recent research on the legibility that allows agents to reveal their intentions through their behavior, we propose a multiagent active legibility framework to improve their performance. The legibility-oriented framework drives agents to conduct legible actions so as to help others optimise their behaviors. In addition, we design a series of problem domains that emulate a common legibility-needed scenario and effectively characterize the legibility in multiagent reinforcement learning. The experimental results demonstrate that the new framework is more efficient and requires less training time compared to several multiagent reinforcement learning algorithms. • We propose the multiagent active legibility framework to develop legible plans in MARL. • We propose the legibility reward shaping technique and prove its correctness. • We design multiple problem domains to showcase the plan recognition and legibility.

UAI Conference 2025 Conference Paper

Adaptive Human-Robot Collaboration using Type-Based IRL

  • Prasanth Sengadu Suresh
  • Prashant Doshi
  • Bikramjit Banerjee

Human-robot collaboration (HRC) integrates the consistency and precision of robotic systems with the dexterity and cognitive abilities of humans to create synergy. However, human performance may degrade due to various factors (e. g. , fatigue, trust) which can manifest unpredictably, and typically results in diminished output and reduced quality. To address this challenge toward successful HRCs, we present a human-aware approach to collaboration using a novel multi-agent decision-making framework. Type-based decentralized Markov decision processes (TB-DecMDP) additionally model latent, causal decision-making factors influencing agent behavior (e. g. , fatigue), leading to dynamic agent types. In this framework, agents can switch between types and each maintains a belief about others’ current type based on observed actions while aiming to achieve a shared objective. We introduce a new inverse reinforcement learning (IRL) algorithm, TB-DecAIRL, which uses TB-DecMDP to model complex HRCs. TB-DecAIRL learns a type-contingent reward function and corresponding vector of policies from team demonstrations. Our evaluations in a realistic HRC problem setting establish that modeling human types in TB-DecAIRL improves robot behavior on the default of ignoring human factors, by increasing throughput in a human-robot produce sorting task.

IROS Conference 2025 Conference Paper

Analyzing Human Perceptions of a MEDEVAC Robot in a Simulated Evacuation Scenario

  • Tyson Jordan
  • Pranav Pandey
  • Prashant Doshi
  • Ramviyas Parasuraman
  • Adam Goodie

The use of autonomous systems in medical evacuation (MEDEVAC) scenarios is promising, but existing implementations overlook key insights from human-robot interaction (HRI) research. Studies on human-machine teams demonstrate that human perceptions of a machine teammate are critical in governing the machine’s performance. Consequently, it is essential to identify the factors that contribute to positive human perceptions in human-machine teams. Here, we present a mixed factorial design to assess human perceptions of a MEDEVAC robot in a simulated evacuation scenario. Participants were assigned to the role of casualty (CAS) or bystander (BYS) and subjected to three within-subjects conditions based on the MEDEVAC robot’s operating mode: autonomous-slow (AS), autonomous-fast (AF), and teleoperation (TO). During each trial, a MEDEVAC robot navigated an 11-meter path, acquiring a casualty and transporting them to an ambulance exchange point while avoiding an idle bystander. Following each trial, subjects completed a questionnaire measuring their emotional states, perceived safety, and social compatibility with the robot. Results indicate a consistent main effect of operating mode on reported emotional states and perceived safety. Pairwise analyses suggest that the employment of the AF operating mode negatively impacted perceptions along these dimensions. There were no persistent differences between CAS and BYS responses.

ECAI Conference 2025 Conference Paper

Inferring Hidden Behavioral Signatures of Cyber Adversaries Using Inverse Reinforcement Learning

  • Aditya Shinde
  • Prashant Doshi

This paper presents an emerging approach to attacker preference modeling from system-level audit logs using inverse reinforcement learning (IRL). Adversary modeling is an important capability in cybersecurity that lets defenders characterize behaviors of potential attackers, which enables attribution to known cyber adversary groups. Existing approaches rely on documenting an ever-evolving set of attacker tools and techniques to track known threat actors. Although attacks evolve constantly, attacker behavioral preferences are intrinsic and less volatile. Our approach learns the behavioral preferences of cyber adversaries from forensics data on their tools and techniques. We model the attacker as an expert decision-making agent with unknown behavioral preferences situated in a computer host. We leverage attack provenance graphs of audit logs to derive a state-action trajectory of the attack. We test our approach on open datasets of audit logs containing real attack data. Our results demonstrate for the first time that low-level forensics data can automatically reveal an adversary’s subjective preferences, which serves as an additional dimension to modeling and documenting cyber adversaries. Attackers’ preferences tend to be less dynamic despite their different tools and indicate predispositions that are inherent to the attacker. As such, these inferred preferences can potentially serve as unique behavioral signatures of attackers and improve threat attribution.

UAI Conference 2025 Conference Paper

MOHITO: Multi-Agent Reinforcement Learning using Hypergraphs for Task-Open Systems

  • Gayathri Anil
  • Prashant Doshi
  • Daniel Redder
  • Adam Eck
  • Leen-Kiat Soh

Open agent systems are prevalent in the real world, where the sets of agents and tasks change over time. In this paper, we focus on task-open multi-agent systems, exemplified by applications such as ridesharing, where passengers (tasks) appear spontaneously over time and disappear if not attended to promptly. Task-open settings challenge us with an action space which changes dynamically. This renders existing reinforcement learning (RL) methods–intended for fixed state and action spaces–inapplicable. Whereas multi-task learning approaches learn policies generalized to multiple known and related tasks, they struggle to adapt to previously unseen tasks. Conversely, lifelong learning adapts to new tasks over time, but generally assumes that tasks come sequentially from a static and known distribution rather than simultaneously and unpredictably. We introduce a novel category of RL for addressing task openness, modeled using a task-open Markov game. Our approach, MOHITO, is a multi-agent actor-critic schema which represents knowledge about the relationships between agents and changing tasks and actions as dynamically evolving 3-uniform hypergraphs. As popular multi-agent RL testbeds do not exhibit task openness, we evaluate MOHITO on two realistic and naturally task-open domains to establish its efficacy and provide a benchmark for future work in this setting.

NeurIPS Conference 2024 Conference Paper

An Autoencoder-Like Nonnegative Matrix Co-Factorization for Improved Student Cognitive Modeling

  • Shenbao Yu
  • Yinghui Pan
  • Yifeng Zeng
  • Prashant Doshi
  • Guoquan Liu
  • Kim-Leng Poh
  • Mingwei Lin

Student cognitive modeling (SCM) is a fundamental task in intelligent education, with applications ranging from personalized learning to educational resource allocation. By exploiting students' response logs, SCM aims to predict their exercise performance as well as estimate knowledge proficiency in a subject. Data mining approaches such as matrix factorization can obtain high accuracy in predicting student performance on exercises, but the knowledge proficiency is unknown or poorly estimated. The situation is further exacerbated if only sparse interactions exist between exercises and students (or knowledge concepts). To solve this dilemma, we root monotonicity (a fundamental psychometric theory on educational assessments) in a co-factorization framework and present an autoencoder-like nonnegative matrix co-factorization (AE-NMCF), which improves the accuracy of estimating the student's knowledge proficiency via an encoder-decoder learning pipeline. The resulting estimation problem is nonconvex with nonnegative constraints. We introduce a projected gradient method based on block coordinate descent with Lipschitz constants and guarantee the method's theoretical convergence. Experiments on several real-world data sets demonstrate the efficacy of our approach in terms of both performance prediction accuracy and knowledge estimation ability, when compared with existing student cognitive models.

JAAMAS Journal 2024 Journal Article

Modeling and reinforcement learning in partially observable many-agent systems

  • Keyang He
  • Prashant Doshi
  • Bikramjit Banerjee

Abstract There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. These methods rely on all the agents sharing various types of information, such as their actions or gradients, with a centralized trainer or each other during the learning. Subsequently, the methods produce agent policies whose prescriptions and performance are contingent on other agents engaging in behavior assumed by the centralized training. But, in many contexts, such as mixed or adversarial settings, this assumption may not be feasible. In this article, we present a new line of methods that relaxes this assumption and engages in decentralized training resulting in the agent’s individual policy. The interactive advantage actor-critic (IA2C) maintains and updates beliefs over other agents’ candidate behaviors based on (noisy) observations, thus enabling learning at the agent’s own level. We also address MARL’s prohibitive curse of dimensionality due to the presence of many agents in the system. Under assumptions of action anonymity and population homogeneity, often exhibited in practice, large numbers of other agents can be modeled aggregately by the count vectors of their actions instead of individual agent models. More importantly, we may model the distribution of these vectors and its update using the Dirichlet-multinomial model, which offers an elegant way to scale IA2C to many-agent systems. We evaluate the performance of the fully decentralized IA2C along with other known baselines on a novel Organization domain, which we introduce, and on instances of two existing domains. Experimental comparisons with prominent and recent baselines show that IA2C is more sample efficient, more robust to noise, and can scale to learning in systems with up to a hundred agents.

AAMAS Conference 2024 Conference Paper

Modeling Cognitive Biases in Decision-theoretic Planning for Active Cyber Deception

  • Aditya Shinde
  • Prashant Doshi

This paper presents an approach to modeling and exploiting cognitive biases of cyber attackers in planning for active deception. Sophisticated cyber attacks are primarily orchestrated by human actors. Hence, we focus on the human aspect of the attacker’s decision-making process. Humans deviate from rational decisionmaking due to various cognitive biases. Here, we focus on fundamental attribution error (FAE) and confirmation bias and their role in cyber deception because these biases contribute to humans being deceived. We use the decision-theoretic planning framework of finitely-nested factored I-POMDP (I-POMDPX), which allows us to explicitly model FAE in multi-agent settings and build cognitive models of the attackers. We show how these biases impact their beliefs as they act and obtain more information about the environment and the adversary. The tractability of the I-POMDPX also allows for modeling agents at a higher strategy level where the optimal policy relies on induction and exploitation of these biases. Hence, we also present an I-POMDPX-based rational defender agent that can model the attacker’s beliefs under the influence of FAE and confirmation bias from a higher strategic level, and exploit them. Our experiments in simulated interactions show that the I-POMDPX-based defender agent can induce FAE in an attacker to distort the attacker’s beliefs. Consequently, the defender agent can exploit the attacker’s cognitive biases to extend the duration of the attack to facilitate the attacker’s intent recognition in a controlled environment. Our work provides a general decision-theoretic formulation of FAE and confirmation bias, and demonstrates its role in planning for agent-based active cyber deception.

IROS Conference 2024 Conference Paper

Open Human-Robot Collaboration using Decentralized Inverse Reinforcement Learning

  • Prasanth Sengadu Suresh
  • Siddarth Jain
  • Prashant Doshi
  • Diego Romeres

The growing interest in human-robot collaboration (HRC), where humans and robots cooperate towards shared goals, has seen significant advancements over the past decade. While previous research has addressed various challenges, several key issues remain unresolved. Many domains within HRC involve activities that do not necessarily require human presence throughout the entire task. Existing literature typically models HRC as a closed system, where all agents are present for the entire duration of the task. In contrast, an open model offers flexibility by allowing an agent to enter and exit the collaboration as needed, enabling them to concurrently manage other tasks. In this paper, we introduce a novel multiagent framework called oDec-MDP, designed specifically to model open HRC scenarios where agents can join or leave tasks flexibly during execution. We generalize a recent multiagent inverse reinforcement learning method - Dec-AIRL to learn from open systems modeled using the oDec-MDP. Our method is validated through experiments conducted in both a simplified toy firefighting domain and a realistic dyadic human-robot collaborative assembly. Results show that our framework and learning method improves upon its closed system counterpart.

AAMAS Conference 2023 Conference Paper

Dec-AIRL: Decentralized Adversarial IRL for Human-Robot Teaming

  • Prasanth Sengadu Suresh
  • Yikang Gui
  • Prashant Doshi

We present a new method for inverse reinforcement learning (IRL) that allows an agent to learn from expert demonstrations and then spontaneously collaborate with a human on the same task. We generalize adversarial IRL (AIRL) to work in a decentralized setting using a decentralized Markov decision process (Dec-MDP) as the underlying model. We posit that a Dec-MDP is a better-suited model for pragmatic multi-agent IRL compared to the multi-agent Markov decision process (MMDP) or the Markov game, which have been utilized thus far. This is because the latter models require an agent to know the global state of the environment, which is impractical in the real world as it may include agent-specific attributes (e. g. joint angles) that may not be directly observable by the other agents. We test our method on two domains: a formative simulated patient assistance scenario and a summative real-world use-inspired domain of sorting onions on a line conveyor. Our method (Dec-AIRL) significantly improves on the previous techniques in both domains. These results indicate that a decentralized multi-agent IRL formalism promotes effective teaming in human-robot collaborative tasks.

AAMAS Conference 2022 Conference Paper

A Hierarchical Bayesian Process for Inverse RL in Partially-Controlled Environments

  • Kenneth Bogert
  • Prashant Doshi

Robots learning from observations in the real world may encounter objects or agents in the environment, other than the expert giving the demonstration, that cause nuisance observations. These confounding elements are typically removed in fully-controlled environments such as virtual simulations or lab settings. When complete removal is impossible the nuisance observations must be filtered out. However, identifying the sources of observations when large amounts of observations are made is difficult. To address this, we present a hierarchical Bayesian process that models both the expert’s and the confounding elements’ observations thereby explicitly modeling the diverse observations a robot may receive. We extend an existing inverse reinforcement learning algorithm originally designed to work under partial occlusion of the expert to consider the diverse and noisy observations. In a simulated robotic produce-sorting domain containing both occlusion and confounding elements, we demonstrate the model’s effectiveness. In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject’s trajectory.

UAI Conference 2022 Conference Paper

Decision-theoretic planning with communication in open multiagent systems

  • Anirudh Kakarlapudi
  • Gayathri Anil
  • Adam Eck
  • Prashant Doshi
  • Leen-Kiat Soh

In open multiagent systems, the set of agents operating in the environment changes over time and in ways that are nontrivial to predict. For example, if collaborative robots were tasked with fighting wildfires, they may run out of suppressants and be temporarily unavailable to assist their peers. Because an agent’s optimal action depends on the actions of others, each agent must not only predict the actions of its peers, but, before that, reason whether they are even present to perform an action. Addressing openness thus requires agents to model each other’s presence, which can be enhanced through agents communicating about their presence in the environment. At the same time, communicative acts can also incur costs (e. g. , consuming limited bandwidth), and thus an agent must tradeoff the benefits of enhanced coordination with the costs of communication. We present a new principled, decision-theoretic method in the context provided by the recent communicative interactive POMDP framework for planning in open agent settings that balances this tradeoff. Simulations of multiagent wildfire suppression problems demonstrate how communication can improve planning in open agent environments, as well as how agents tradeoff the benefits and costs of communication under different scenarios.

UAI Conference 2022 Conference Paper

Marginal MAP estimation for inverse RL under occlusion with observer noise

  • Prasanth Sengadu Suresh
  • Prashant Doshi

We consider the problem of learning the behavioral preferences of an expert engaged in a task from noisy and partially-observable demonstrations. This is motivated by real-world applications such as a line robot learning from observing a human worker, where some observations are occluded by environmental elements. Furthermore, robotic perception tends to be imperfect and noisy. Previous techniques for inverse reinforcement learning (IRL) take the approach of either omitting the missing portions or inferring it as part of expectation-maximization, which tends to be slow and prone to local optima. We present a new method that generalizes the well-known Bayesian maximum-a-posteriori (MAP) IRL method by marginalizing the occluded portions of the trajectory. This is then extended with an observation model to account for perception noise. This novel application of marginal MAP (MMAP) to IRL significantly improves on the previous IRL technique under occlusion in both formative evaluations on a toy problem and in a summative evaluation on a produce sorting line task by a physical robot.

UAI Conference 2022 Conference Paper

Reinforcement learning in many-agent settings under partial observability

  • Keyang He
  • Prashant Doshi
  • Bikramjit Banerjee

Recent renewed interest in multi-agent reinforcement learning (MARL) has generated an impressive array of techniques that leverage deep RL, primarily actor-critic architectures, and can be applied to a limited range of settings in terms of observability and communication. However, a continuing limitation of much of this work is the curse of dimensionality when it comes to representations based on joint actions, which grow exponentially with the number of agents. In this paper, we squarely focus on this challenge of scalability. We apply the key insight of action anonymity to a recently presented actor-critic based MARL algorithm, interactive A2C. We introduce a Dirichlet-multinomial model for maintaining beliefs over the agent population when agents’ actions are not perfectly observable. We show that the posterior is a mixture of Dirichlet distributions that we approximate as a single component for tractability. We also show that the prediction accuracy of this method increases with more agents. Finally we show empirically that our method can learn optimal behaviors in two recently introduced pragmatic domains with large agent population, and demonstrates robustness in partially observable environments.

AIJ Journal 2021 Journal Article

A survey of inverse reinforcement learning: Challenges, methods and progress

  • Saurabh Arora
  • Prashant Doshi

Inverse reinforcement learning (IRL) is the problem of inferring the reward function of an agent, given its policy or observed behavior. Analogous to RL, IRL is perceived both as a problem and as a class of methods. By categorically surveying the extant literature in IRL, this article serves as a comprehensive reference for researchers and practitioners of machine learning as well as those new to it to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges such as the difficulty in performing accurate inference and its generalizability, its sensitivity to prior knowledge, and the disproportionate growth in solution complexity with problem size. The article surveys a vast collection of foundational methods grouped together by the commonality of their objectives, and elaborates how these methods mitigate the challenges. We further discuss extensions to the traditional IRL methods for handling imperfect perception, an incomplete model, learning multiple reward functions and nonlinear reward functions. The article concludes the survey with a discussion of some broad advances in the research area and currently open research questions.

AAMAS Conference 2021 Conference Paper

Cooperative-Competitive Reinforcement Learning with History-Dependent Rewards

  • Keyang He
  • Bikramjit Banerjee
  • Prashant Doshi

Consider a typical organization whose worker agents seek to collectively cooperate for its general betterment. However, each individual agent simultaneously seeks to act to secure a larger chunk than its co-workers of the annual increment in compensation, which usually comes from a fixed pot. As such, the agents in an organization must cooperate and compete. Another feature of many organizations is that a worker receives a bonus, which is often a fraction of previous year’s total profit. As such, the agent derives a reward that is also partly dependent on historical performance. How should the individual agent decide to act in this context? Few methods for the mixed cooperative-competitive setting have been presented in recent years, but these are challenged by problem domains whose reward functions additionally depend on historical information. Recent deep multi-agent reinforcement learning (MARL) methods using long short-term memory (LSTM) may be used, but these adopt a joint perspective to the interaction or require explicit exchange of information among the agents to promote cooperation, which may not be possible under competition. In this paper, we first show that the agent’s decision-making problem can be modeled as an interactive partially observable Markov decision process (I-POMDP) that captures the dynamic of a history-dependent reward. We present an interactive advantage actor-critic method (IA2C+), which combines the independent advantage actor-critic network with a belief filter that maintains a belief distribution over other agents’ models. Empirical results show that IA2C+ learns the optimal policy faster and more robustly than several baselines.

AAMAS Conference 2021 Conference Paper

Cyber Attack Intent Recognition and Active Deception using Factored Interactive POMDPs

  • Aditya Shinde
  • Prashant Doshi
  • Omid Setayeshfar

This paper presents an intelligent and adaptive agent that employs deception to recognize a cyber adversary’s intent on a honeypot host. Unlike previous approaches to cyber deception, which mainly focus on delaying or confusing the attackers, we focus on engaging with them to learn their intent. We model cyber deception as a sequential decision-making problem in a two-agent context. We introduce factored finitely-nested interactive POMDPs (I-POMDPX) and use this framework to model the problem with multiple attacker types. Our approach models cyber attacks on a single honeypot host across multiple phases from the attacker’s initial entry to reaching its adversarial objective. The defending I-POMDPX-based agent uses decoys to engage with the attacker at multiple phases to form increasingly accurate predictions of the attacker’s behavior and intent. The use of I-POMDPs also enables us to model the adversary’s mental state and investigate how deception affects their beliefs. Our experiments in both simulation and with the agent deployed on a host system show that the I-POMDPX-based agent performs significantly better at intent recognition than commonly used deception strategies on honeypots. This emerging application of autonomous agents offers a new approach that contrasts with the traditional action-reaction dynamic that has defined interactions between cyber attackers and defenders for years.

ICAPS Conference 2021 Conference Paper

Data-Driven Decision-Theoretic Planning using Recurrent Sum-Product-Max Networks

  • Hari Teja Tatavarti
  • Prashant Doshi
  • Layton Hayes

Sum-product networks (SPN) are knowledge compilation models and are related to other graphical models for efficient probabilistic inference such as arithmetic circuits and AND/OR graphs. Recent investigations into generalizing SPNs have yielded sum-product-max networks (SPMN) which offer a data-driven alternative for decision making that has predominantly relied on handcrafted models. However, SPMNs are not suited for decision-theoretic planning which involves sequential decision making over multiple time steps. In this paper, we present recurrent SPMNs (RSPMN) that learn from and model decision-making data over time. RSPMNs utilize a template network that is unfolded as needed depending on the length of the data sequence. This is significant as RSPMNs not only inherit the benefits of SPNs in being data driven and mostly tractable, they are also well suited for planning problems. We establish soundness conditions on the template network, which guarantee that the resulting SPMN is valid, and present a structure learning algorithm to learn a sound template. RSPMNs learned on a testbed of data sets, some generated using RDDLSim, yield MEUs and policies that are close to the optimal on perfectly-observed domains and easily improve on a recent batch-constrained RL method, which is important because RSPMNs offer a new model-based approach to offline RL.

ICRA Conference 2021 Conference Paper

Min-Max Entropy Inverse RL of Multiple Tasks

  • Saurabh Arora
  • Prashant Doshi
  • Bikramjit Banerjee

Multi-task IRL recognizes that expert(s) could be switching between multiple ways of solving the same problem, or interleaving demonstrations of multiple tasks. The learner aims to learn the reward functions that individually guide these distinct ways. We present a new method for multi-task IRL that generalizes the well-known maximum entropy approach by combining it with a Dirichlet process based minimum entropy clustering of the observed data. This yields a single nonlinear optimization problem, called MinMaxEnt Multi-task IRL (MME-MTIRL), which can be solved using the Lagrangian relaxation and gradient descent methods. We evaluate MME-MTIRL on the robotic task of sorting onions on a processing line where the expert utilizes multiple ways of detecting and removing blemished onions. The method is able to learn the underlying reward functions to a high level of accuracy and it improves on the previous approaches.

IJCAI Conference 2021 Conference Paper

State-Based Recurrent SPMNs for Decision-Theoretic Planning under Partial Observability

  • Layton Hayes
  • Prashant Doshi
  • Swaraj Pawar
  • Hari Teja Tatavarti

The sum-product network (SPN) has been extended to model sequence data with the recurrent SPN (RSPN), and to decision-making problems with sum-product-max networks (SPMN). In this paper, we build on the concepts introduced by these extensions and present state-based recurrent SPMNs (S-RSPMNs) as a generalization of SPMNs to sequential decision-making problems where the state may not be perfectly observed. As with recurrent SPNs, S-RSPMNs utilize a repeatable template network to model sequences of arbitrary lengths. We present an algorithm for learning compact template structures by identifying unique belief states and the transitions between them through a state matching process that utilizes augmented data. In our knowledge, this is the first data-driven approach that learns graphical models for planning under partial observability, which can be solved efficiently. S-RSPMNs retain the linear solution complexity of SPMNs, and we demonstrate significant improvements in compactness of representation and the run time of structure learning and inference in sequential domains.

JAAMAS Journal 2020 Journal Article

I2RL: online inverse reinforcement learning under occlusion

  • Saurabh Arora
  • Prashant Doshi
  • Bikramjit Banerjee

Abstract Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. It inverts RL which focuses on learning an agent’s behavior on a task based on the reward signals received. IRL is witnessing sustained attention due to promising applications in robotics, computer games, and finance, as well as in other sectors. Methods for IRL have, for the most part, focused on batch settings where the observed agent’s behavioral data has already been collected. However, the related problem of online IRL—where observations are incrementally accrued, yet the real-time demands of the application often prohibit a full rerun of an IRL method—has received significantly less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), which can serve as a common ground for online IRL methods. We demonstrate the usefulness of this framework by casting existing online IRL techniques into this framework. Importantly, we present a new method that advances maximum entropy IRL with hidden variables to the online setting. Our analysis shows that the new method has monotonically improving performance with more demonstration data as well as probabilistically bounded error, both under full and partial observability. Simulated and physical robot experiments in a multi-robot patrolling application situated in varied-sized worlds, which involves learning under high levels of occlusion, show a significantly improved performance of I2RL as compared to both batch IRL and an online imitation learning method.

AIJ Journal 2020 Journal Article

Recursively modeling other agents for decision making: A research perspective

  • Prashant Doshi
  • Piotr Gmytrasiewicz
  • Edmund Durfee

Individuals exhibit theory of mind, attributing beliefs, intent, and mental states to others as explanations of observed actions. Dennett's intentional stance offers an analogous abstraction for computational agents seeking to understand, explain, or predict others' behaviors. These recognized theories provide a formal basis to ongoing investigations of recursive modeling. We review and situate various frameworks for recursive modeling that have been studied in game- and decision- theories, and have yielded methods useful to AI researchers. Sustained attention given to these frameworks has produced new analyses and methods with an aim toward making recursive modeling practicable. Indeed, we also review some emerging uses and the insights these yielded, which are indicative of pragmatic progress in this area. The significance of these frameworks is that higher-order reasoning is critical to correctly recognizing others' intent or outthinking opponents. Such reasoning has been utilized in academic, business, military, security, and other contexts both to train and inform decision-making agents in organizational and strategic contexts, and also to more realistically predict and best respond to other agents' intent.

ICRA Conference 2020 Conference Paper

SA-Net: Robust State-Action Recognition for Learning from Observations

  • Nihal Soans
  • Ehsan Asali
  • Yi Hong
  • Prashant Doshi

Learning from observation (LfO) offers a new paradigm for transferring task behavior to robots. LfO requires the robot to observe the task being performed and decompose the sensed streaming data into sequences of state-action pairs, which are then input to LfO methods. Thus, recognizing the state-action pairs correctly and quickly in sensed data is a crucial prerequisite. We present SA-Net a deep neural network architecture that recognizes state-action pairs from RGB-D data streams. SA-Net performs well in two replicated robotic applications of LfO - one involving mobile ground robots and another involving a robotic manipulator - which demonstrates that the architecture could generalize well to differing contexts. Comprehensive evaluations including deployment on a physical robot show that SA-Net significantly improves on the accuracy of the previous methods under various conditions.

AAAI Conference 2020 Conference Paper

Scalable Decision-Theoretic Planning in Open and Typed Multiagent Systems

  • Adam Eck
  • Maulik Shah
  • Prashant Doshi
  • Leen-Kiat Soh

In open agent systems, the set of agents that are cooperating or competing changes over time and in ways that are nontrivial to predict. For example, if collaborative robots were tasked with fighting wildfires, they may run out of suppressants and be temporarily unavailable to assist their peers. We consider the problem of planning in these contexts with the additional challenges that the agents are unable to communicate with each other and that there are many of them. Because an agent’s optimal action depends on the actions of others, each agent must not only predict the actions of its peers, but, before that, reason whether they are even present to perform an action. Addressing openness thus requires agents to model each other’s presence, which becomes computationally intractable with high numbers of agents. We present a novel, principled, and scalable method in this context that enables an agent to reason about others’ presence in its shared environment and their actions. Our method extrapolates models of a few peers to the overall behavior of the many-agent system, and combines it with a generalization of Monte Carlo tree search to perform individual agent reasoning in manyagent open environments. Theoretical analyses establish the number of agents to model in order to achieve acceptable worst case bounds on extrapolation error, as well as regret bounds on the agent’s utility from modeling only some neighbors. Simulations of multiagent wildfire suppression problems demonstrate our approach’s efficacy compared with alternative baselines.

UAI Conference 2019 Conference Paper

Evacuate or Not? A POMDP Model of the Decision Making of Individuals in Hurricane Evacuation Zones

  • Adithya Raam Sankar
  • Prashant Doshi
  • Adam Goodie

Recent hurricanes in the Atlantic region of the southern United States triggered a series of evacuation orders in the coastal cities of Florida, Georgia, and Texas. While some of these urged voluntary evacuations, most were mandatory orders. Despite governments asking people to vacate their homes for their own safety, many do not. We aim to understand the observable and hidden variables involved in the decision-making process and model these in a partially observable Markov decision process, which predicts whether a person will evacuate or not given his or her current situation. We consider the features of the particular hurricane, the dynamic situation that the individual is experiencing, and demographic factors that influence the decision making of individuals. The process model is represented as a dynamic influence diagram and evaluated on data collected via a comprehensive survey of hurricane-impacted individuals.

AAAI Conference 2019 Conference Paper

Model-Free IRL Using Maximum Likelihood Estimation

  • Vinamra Jain
  • Prashant Doshi
  • Bikramjit Banerjee

We propose a probabilistic model for estimating population flow, which is defined as populations of the transition between areas over time, given aggregated spatio-temporal population data. Since there is no information about individual trajectories in the aggregated data, it is not straightforward to estimate population flow. With the proposed method, we utilize a collective graphical model with which we can learn individual transition models from the aggregated data by analytically marginalizing the individual locations. Learning a spatio-temporal collective graphical model only from the aggregated data is an ill-posed problem since the number of parameters to be estimated exceeds the number of observations. The proposed method reduces the effective number of parameters by modeling the transition probabilities with a neural network that takes the locations of the origin and the destination areas and the time of day as inputs. By this modeling, we can automatically learn nonlinear spatio-temporal relationships flexibly among transitions, locations, and times. With four real-world population data sets in Japan and China, we demonstrate that the proposed method can estimate the transition population more accurately than existing methods.

RLDM Conference 2019 Conference Abstract

Modeling cooperative and competitive decision-making in the Tiger Task

  • Saurabh A Kumar
  • Prashant Doshi
  • Michael Spezio
  • Jan P

The mathematical models underlying reinforcement learning help us understand how agents nav- igate the world and maximize future reward. Partially observable Markov Decision Processes (POMDPs)— an extension of classic RL—allow for action planning in uncertain environments. In this study we set out to investigate human decision-making under these circumstances in the context of cooperation and competition using the iconic Tiger Task (TT) in single-player and cooperative and competitive multi-player versions. The task mimics the setting of a game show, in which the participant has to choose between two doors hiding either a tiger (-100 points) or a treasure (+10 points) or taking a probabilistic hint about the tiger location (-1 point). In addition to the probabilistic location hints, the multi-player TT also includes probabilistic information about the other player’s actions. POMDPs have been successfully used in simulations of the single-player TT. A critical feature are the beliefs (probability distributions) about current position in the state space. However, here we leverage interactive POMDPs (I-POMDPs) for the modeling choice data from the cooperative and competitive multi-player TT. I-POMDPs construct a model of the other player’s beliefs, which are incorporated into the own valuation process. We demonstrate using hierarchical logis- tic regression modeling that the cooperative context elicits better choices and more accurate predictions of the other player’s actions. Furthermore, we show that participants generate Bayesian beliefs to guide their actions. Critically, including the social information in the belief updating improves model performance underlining that participants use this information in their belief computations. In the next step we will use I- POMDPs that explicitly model other players as an intentional agents to investigate the generation of mental models and Theory of Mind in cooperative and competitive decision-making in humans.

RLDM Conference 2019 Conference Abstract

Modeling models of others’ mental states: characterizing Theory of Mind during cooperative interaction

  • Tessa Rusch
  • Prashant Doshi
  • Martin Hebart
  • Michael Spezio
  • Jan P Gläscher

Humans are experts in cooperation. To effectively engage with others they have to apply Theory of Mind (ToM), that is they have to model others beliefs, desires, and intentions and predict their behav- ior from these mental states. Here, we investigate ToM processes during real-time reciprocal coordination between two players engaging in a cooperative decision game. The game consists of a noisy and unsta- ble environment. To succeed participants have to model the state of the world and their partner’s belief about it and integrate both pieces of information into a coherent decision. Thereby the game combines so- cial and non-social learning into a single decision problem. To quantify the learning processes underlying participants’ actions, we modeled the behavior with Interactive Partially Observable Markov Decisions Pro- cesses (I-POMDP). The I-POMDP framework extends single agent action planning under uncertainty to the multi-agent domain by including intentional models of other agents. Using this framework we successfully predicted interactive behavior. Furthermore, we extracted participants’ beliefs about the environment and their beliefs about the mental states of their partners, giving us direct access to the cognitive operations underling cooperative behavior. By relating players’ own beliefs with their partners’ model of themselves we show that dyads whose beliefs are more aligned coordinate more successfully. This provides strong evidence that behavioral coordination relies on mental alignment.

AAMAS Conference 2019 Conference Paper

Online Inverse Reinforcement Learning Under Occlusion

  • Saurabh Arora
  • Prashant Doshi
  • Bikramjit Banerjee

Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. While this problem is witnessing sustained attention, the related problem of online IRL – where the observations are incrementally accrued, yet the real-time demands of the application often prohibit a full rerun of an IRL method – has received much less attention. We introduce a formal framework for online IRL, called incremental IRL (I2RL), and a new method that advances maximum entropy IRL with hidden variables, to this setting. Our analysis shows that the new method has a monotonically improving performance with more demonstration data, as well as probabilistically bounded error, both under full and partial observability. Experiments in a simulated robotic application, which involves learning under occlusion, show the significantly improved performance of I2RL as compared to both batch IRL and an online imitation learning method.

IROS Conference 2018 Conference Paper

Inverse Learning of Robot Behavior for Collaborative Planning

  • Maulesh Trivedi
  • Prashant Doshi

Inverse reinforcement learning (IRL) is an important basis for learning from demonstrations. Observing an agent, human or robotic, perform a task provides information and facilitates learning the task. We show how the agent's preferences learned using IRL can be incorporated in a subject robot's decision making and planning, to enable the robot to spontaneously collaborate with the previously observed agent on the task. We prioritize a real-world application, where a line robot will autonomously collaborate with another robot in sorting ripe and unripe fruit such as oranges. Toward this, our evaluations utilize a colored-ball sorting task as an analog using simulated TurtleBots equipped with Phantom X arms. Our method is comprehensive providing first answers to questions such as how should the robot acquire the complete model for the collaborative planning problem and how should it solve the problem to obtain a plan that permits collaboration without disrupting the line robot's behavior.

AIJ Journal 2018 Journal Article

Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions

  • Kenneth Bogert
  • Prashant Doshi

Inverse reinforcement learning (IRL), analogously to RL, refers to both the problem and associated methods by which an agent passively observing another agent's actions over time, seeks to learn the latter's reward function. The learning agent is typically called the learner while the observed agent is often an expert in popular applications such as in learning from demonstrations. Some of the assumptions that underlie current IRL methods are impractical for many robotic applications. Specifically, they assume that the learner has full observability of the expert as it performs its task; that the learner has full knowledge of the expert's dynamics; and that there is always only one expert agent in the environment. For example, these assumptions are particularly restrictive in our application scenario where a subject robot is tasked with penetrating a perimeter patrol by two other robots after observing them from a vantage point. In our instance of this problem, the learner can observe at most 10% of the patrol. We relax these assumptions and systematically generalize a known IRL method, Maximum Entropy IRL, to enable the subject to learn the preferences of the patrolling robots, subsequently their behaviors, and predict their future positions well enough to plan a route to its goal state without being spotted. Challenged by occlusion, multiple interacting robots, and partially known dynamics we demonstrate empirically that the generalization improves significantly on several baselines in its ability to inversely learn in this application setting. Of note, it leads to significant improvement in the learner's overall success rate of penetrating the patrols. Our methods represent significant steps towards making IRL pragmatic and applicable to real-world contexts.

NeurIPS Conference 2018 Conference Paper

Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks

  • Agastya Kalra
  • Abdullah Rashwan
  • Wei-Shou Hsu
  • Pascal Poupart
  • Prashant Doshi
  • Georgios Trimponias

Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i. e. , completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes a new online structure learning technique for feed-forward and recurrent SPNs. The algorithm is demonstrated on real-world datasets with continuous features for which it is not clear what network architecture might be best, including sequence datasets of varying length.

ICRA Conference 2017 Conference Paper

A layered HMM for predicting motion of a leader in multi-robot settings

  • Sina Solaimanpour
  • Prashant Doshi

We focus on a mobile robot that must learn another robot's motion model from observations to track it in a given map. This problem has several real-world applications such as self-driving cars being electronically towed by other cars and for telepresence robots. Our context is a nested particle filter, a generalization of the traditional particle filter, that allows both self-localization and tracking of another robot simultaneously. While the robot's observations are used to weight nested particles, the problem arises during the propagation step of the nested particles during which a motion model is needed. We introduce a novel layered hidden Markov model for this problem and present an on-line algorithm which learns the HMM parameters from observations gathered during the run. We demonstrate significantly improved tracking accuracy on using this new model to predict the motion of a leading mobile robot, in comparison to pre-defined and random motion models as previously used in literature.

JAIR Journal 2017 Journal Article

Decision-Theoretic Planning Under Anonymity in Agent Populations

  • Ekhlas Sonu
  • Yingke Chen
  • Prashant Doshi

We study the problem of self-interested planning under uncertainty in settings shared with more than a thousand other agents, each of which plans at its own individual level. We refer to such large numbers of agents as an agent population. The decision-theoretic formalism of interactive partially observable Markov decision process (I-POMDP) is used to model the agent's self-interested planning. The first contribution of this article is a method for drastically scaling the finitely-nested I-POMDP to certain agent populations for the first time. Our method exploits two types of structure that is often exhibited by agent populations -- anonymity and context-specific independence. We present a variant called the many-agent I-POMDP that models both these types of structure to plan efficiently under uncertainty in multiagent settings. In particular, the complexity of the belief update and solution in the many-agent I-POMDP is polynomial in the number of agents compared with the exponential growth that challenges the original framework. While exploiting structure helps mitigate the curse of many agents, the well-known curse of history that afflicts I-POMDPs continues to challenge scalability in terms of the planning horizon. The second contribution of this article is an application of the branch-and-bound scheme to reduce the exponential growth of the search tree for look ahead. For this, we introduce new fast-computing upper and lower bounds for the exact value function of the many-agent I-POMDP. This speeds up the look-ahead computations without trading off optimality, and reduces both memory and run time complexity. The third contribution is a comprehensive empirical evaluation of the methods on three new problems domains -- policing large protests, controlling traffic congestion at a busy intersection, and improving the AI for the popular Clash of Clans multiplayer game. We demonstrate the feasibility of exact self-interested planning in these large problems, and that our methods for speeding up the planning are effective. Altogether, these contributions represent a principled and significant advance toward moving self-interested planning under uncertainty to real-world applications.

AAAI Conference 2017 Conference Paper

On Markov Games Played by Bayesian and Boundedly-Rational Players

  • Muthukumaran Chandrasekaran
  • Yingke Chen
  • Prashant Doshi

We present a new game-theoretic framework in which Bayesian players with bounded rationality engage in a Markov game and each has private but incomplete information regarding other players’ types. Instead of utilizing Harsanyi’s abstract types and a common prior, we construct intentional player types whose structure is explicit and induces a finite-level belief hierarchy. We characterize an equilibrium in this game and establish the conditions for existence of the equilibrium. The computation of finding such equilibria is formalized as a constraint satisfaction problem and its effectiveness is demonstrated on two cooperative domains.

UAI Conference 2017 Conference Paper

Robust Model Equivalence using Stochastic Bisimulation for N-Agent Interactive DIDs

  • Muthukumaran Chandrasekaran
  • Junhuan Zhang
  • Prashant Doshi
  • Yifeng Zeng

I-DIDs suffer disproportionately from the curse of dimensionality dominated by the exponential growth in the number of models over time. Previous methods for scaling I-DIDs identify notions of equivalence between models, such as behavioral equivalence (BE). But, this requires that the models be solved first. Also, model space compression across agents has not been previously investigated. We present a way to compress the space of models across agents, possibly with different frames, and do so without having to solve them first, using stochastic bisimulation. We test our approach on two non-cooperative partially observable domains with up to 20 agents.

JAAMAS Journal 2016 Journal Article

Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams

  • Muthukumaran Chandrasekaran
  • Prashant Doshi
  • Yingke Chen

Abstract Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams ( I-DID ). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DID s. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations.

AAAI Conference 2016 Conference Paper

Decision Sum-Product-Max Networks

  • Mazen Melibari
  • Pascal Poupart
  • Prashant Doshi

Sum-Product Networks (SPNs) were recently proposed as a new class of probabilistic graphical models that guarantee tractable inference, even on models with high-treewidth. In this paper, we propose a new extension to SPNs, called Decision Sum-Product-Max Networks (Decision-SPMNs), that makes SPNs suitable for discrete multi-stage decision problems. We present an algorithm that solves Decision-SPMNs in a time that is linear in the size of the network. We also present algorithms to learn the parameters of the network from data.

UAI Conference 2016 Conference Paper

Individual Planning in Open and Typed Agent Systems

  • Muthukumaran Chandrasekaran
  • Adam Eck
  • Prashant Doshi
  • Leen-Kiat Soh

Open agent systems are multiagent systems in which one or more agents may leave the system at any time possibly resuming after some interval and in which new agents may also join. Planning in such systems becomes challenging in the absence of inter-agent communication because agents must predict if others have left the system or new agents are now present to decide on possibly choosing a different line of action. In this paper, we prioritize open systems where agents of differing types may leave and possibly reenter but new agents do not join. With the help of a realistic domain – wildfire suppression – we motivate the need for individual planning in open environments and present a first approach for robust decision-theoretic planning in such multiagent systems. Evaluations in domain simulations clearly demonstrate the improved performance compared to previous methods that disregard the openness.

IJCAI Conference 2016 Conference Paper

Sum-Product-Max Networks for Tractable Decision Making

  • Mazen Melibari
  • Pascal Poupart
  • Prashant Doshi

Investigations into probabilistic graphical models for decision making have predominantly centered on influence diagrams (IDs) and decision circuits (DCs) for representation and computation of decision rules that maximize expected utility. Since IDs are typically handcrafted and DCs are compiled from IDs, in this paper we propose an approach to learn the structure and parameters of decision-making problems directly from data. We present a new representation called sum-product-max network (SPMN) that generalizes a sum-product network (SPN) to the class of decision-making problems and whose solution, analogous to DCs, scales linearly in the size of the network. We show that SPMNs may be reduced to DCs linearly and present a first method for learning SPMNs from data. This approach is significant because it facilitates a novel paradigm of tractable decision making driven by data.

ICAPS Conference 2015 Conference Paper

Individual Planning in Agent Populations: Exploiting Anonymity and Frame-Action Hypergraphs

  • Ekhlas Sonu
  • Yingke Chen
  • Prashant Doshi

Interactive partially observable Markov decision processes (I-POMDP) provide a formal framework for planning for a self-interested agent in multiagent settings. An agent operating in a multiagent environment must deliberate about the actions that other agents may take and the effect these actions have on the environment and the rewards it receives. Traditional I-POMDPs model this dependence on the actions of other agents using joint action and model spaces. Therefore, the solution complexity grows exponentially with the number of agents thereby complicating scalability. In this paper, we model and extend anonymity and context-specific independence — problem structures often present in agent populations — for computational gain. We empirically demonstrate the efficiency from exploiting these problem structures by solving a new multiagent problem involving more than 1, 000 agents.

NeurIPS Conference 2015 Conference Paper

Individual Planning in Infinite-Horizon Multiagent Settings: Inference, Structure and Scalability

  • Xia Qu
  • Prashant Doshi

This paper provides the first formalization of self-interested planning in multiagent settings using expectation-maximization (EM). Our formalization in the context of infinite-horizon and finitely-nested interactive POMDPs (I-POMDP) is distinct from EM formulations for POMDPs and cooperative multiagent planning frameworks. We exploit the graphical model structure specific to I-POMDPs, and present a new approach based on block-coordinate descent for further speed up. Forward filtering-backward sampling -- a combination of exact filtering with sampling -- is explored to exploit problem structure.

IROS Conference 2015 Conference Paper

Localization and tracking under extreme and persistent sensory occlusion

  • Kedar Marathe
  • Prashant Doshi

We focus on a mobile robot who must keep itself localized while closely following another robot or human. This problem has many real-world applications including that of a co-bot engaged in a follow-the-leader behavior or a robot that is participating in a convoy. If the robot is expected to eventually break away and reach its own goal, then the robot must stay self-localized. A key challenge for localization while tailing another is the extreme and persistent occlusion of the robot's sensors by the dynamic obstacle in front of it that is not modeled in its map. Current Monte Carlo localization (MCL) methods use sensor models with random noise, which are inadequate under such occlusion. We utilize a particle filter that simultaneously tracks the subject robot and the leader. We introduce novel particle weighting and adaptive sampling schemes that significantly improve the follower's localization. The result is a robust and adaptive MCL for applications involving persistent occlusion.

IJCAI Conference 2015 Conference Paper

Toward Estimating Others' Transition Models Under Occlusion for Multi-Robot IRL

  • Kenneth Bogert
  • Prashant Doshi

Multi-robot inverse reinforcement learning (mIRL) is broadly useful for learning, from observations, the behaviors of multiple robots executing fixed trajectories and interacting with each other. In this paper, we relax a crucial assumption in IRL to make it better suited for wider robotic applications: we allow the transition functions of other robots to be stochastic and do not assume that the transition error probabilities are known to the learner. Challenged by occlusion where large portions of others’ state spaces are fully hidden, we present a new approach that maps stochastic transitions to distributions over features. Then, the underconstrained problem is solved using nonlinear optimization that maximizes entropy to learn the transition function of each robot from occluded observations. Our methods represent significant and first steps toward making mIRL pragmatic.

JAAMAS Journal 2014 Journal Article

Scalable solutions of interactive POMDPs using generalized and bounded policy iteration

  • Ekhlas Sonu
  • Prashant Doshi

Abstract Policy iteration algorithms for partially observable Markov decision processes (POMDPs) offer the benefits of quicker convergence compared to value iteration and the ability to operate directly on the solution, which usually takes the form of a finite state automaton. However, the finite state controller tends to grow quickly in size across iterations due to which its evaluation and improvement become computationally costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration, and allow POMDPs to scale. In this article, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform bounded policy iteration with anytime behavior in settings formalized by the interactive POMDP framework, which generalizes POMDPs to non-stationary contexts shared with multiple other agents. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its novel generalization in this article makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling other agents sharing the environment, we ascribe controllers to predict others’ actions, with the benefit that the controllers compactly represent the model space. We show how we may exploit the agent’s initial belief, often available, toward further improving the controller, particularly in large domains, though at the expense of increased computations, which we compensate. We extensively evaluate the approach on multiple problem domains with some that are significantly large in their dimensions, and in contexts with uncertainty about the other agent’s frames and those involving multiple other agents, and demonstrate its properties and scalability.

IJCAI Conference 2013 Conference Paper

Bimodal Switching for Online Planning in Multiagent Settings

  • Ekhlas Sonu
  • Prashant Doshi

We present a bimodal method for online planning in partially observable multiagent settings as formalized by a finitely-nested interactive partially observable Markov decision process (I-POMDP). An agent planning in an environment shared with another updates beliefs both over the physical state and the other agents’ models. In problems where we do not observe other’s action explicitly but must infer it from sensing its effect on the state, observations are more informative about the other when the belief over the state space has reduced uncertainty. For typical, uncertain initial beliefs, we model the agent as if it were acting alone and utilize fast online planning for POMDPs. Subsequently, the agent switches to online planning in multiagent settings. We maintain tight lower and upper bounds at each step, and switch over when the difference between them reduces to less than.

AAMAS Conference 2012 Conference Paper

GaTAC: A Scalable and Realistic Testbed for Multiagent Decision Making

  • Ekhlas Sonu
  • Prashant Doshi

In an attempt to bridge the gap between the theoretical advances in multiagent decision making algorithms and their application in real world scenario, we present the \emph{Georgia testbed for autonomous control of vehicles (GaTAC)}. GaTAC provides a low-cost, open-source and flexible environment for realistically simulating and evaluating policies generated by multi-agent decision making algorithms in real world problem domains pertaining to control of autonomous uninhabited aerial vehicles (AUAVs). We describe GaTAC in detail and shall demonstrate how GaTAC could be used to simulate an example AUAV problem. We expect GaTAC to facilitate the development and evaluation of scalable decision making algorithms with results that have immediate practical implications.

AAMAS Conference 2012 Conference Paper

Generalized and Bounded Policy Iteration for Finitely-Nested Interactive POMDPs: Scaling Up

  • Ekhlas Sonu
  • Prashant Doshi

Policy iteration algorithms for partially observable Markov decision processes (POMDP) offer the benefits of quick convergence and the ability to operate directly on the solution, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration. In this paper, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform policy iteration in settings formalized by the interactive POMDP framework. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its generalization here makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling others, we ascribe nested controllers to predict others' actions, with the benefit that the controllers compactly represent the model space. We evaluate our approach on multiple problem domains, and demonstrate its properties and scalability.

AAAI Conference 2012 Conference Paper

Improved Convergence of Iterative Ontology Alignment using Block-Coordinate Descent

  • Uthayasanker Thayasivam
  • Prashant Doshi

A wealth of ontologies, many of which overlap in their scope, has made aligning ontologies an important problem for the semantic Web. Consequently, several algorithms now exist for automatically aligning ontologies, with mixed success in their performances. Crucial challenges for these algorithms involve scaling to large ontologies, and as applications of ontology alignment evolve, performing the alignment in a reasonable amount of time without compromising on the quality of the alignment. A class of alignment algorithms is iterative and often consumes more time than others while delivering solutions of high quality. We present a novel and general approach for speeding up the multivariable optimization process utilized by these algorithms. Specifically, we use the technique of block-coordinate descent in order to possibly improve the speed of convergence of the iterative alignment techniques. We integrate this approach into three well-known alignment systems and show that the enhanced systems generate similar or improved alignments in significantly less time on a comprehensive testbed of ontology pairs. This represents an important step toward making alignment techniques computationally more feasible.

AAMAS Conference 2012 Conference Paper

Modeling Deep Strategic Reasoning by Humans in Competitive Games

  • Xia Qu
  • Prashant Doshi
  • Adam Goodie

The prior literature on strategic reasoning by humans of the sort, what do you think that I think that you think, is that humans generally do not reason beyond a single level. However, recent evidence suggests that if the games are made competitive and therefore representationally simpler, humans generally exhibited behavior that was more consistent with deeper levels of recursive reasoning. We seek to computationally model behavioral data that is consistent with deep recursive reasoning in competitive games. We use generative, process models built from agent frameworks that simulate the observed data well and also exhibit psychological intuition.

AAMAS Conference 2011 Conference Paper

Approximating Behavioral Equivalence of Models Using Top-K Policy Paths

  • Yifeng Zeng
  • Yingke Chen
  • Prashant Doshi

Decision making and game play in multiagent settings must often contend with behavioral models of other agents in order to predict their actions. One approach that reduces the complexity of the unconstrained model space is to group models that tend to be behaviorally equivalent. In this paper, we seek to further compress the model space by introducing an approximate measure of behavioral equivalence and using it to group models.

AAMAS Conference 2011 Conference Paper

Identifying and Exploiting Weak-Information Inducing Actions in Solving POMDPs

  • Ekhlas Sonu
  • Prashant Doshi

We present a method for identifying actions that lead to observations which are only weakly informative in the context of partially observable Markov decision processes (POMDP). We call such actions as weak- (inclusive of zero-)information inducing. Policy subtrees rooted at these actions may be computed more efficiently. While zero-information inducing actions may be exploited without error, we show that the error due to the quicker backup for weak but non-zero information inducing actions is bounded. We demonstrate the substantial computational savings that exploiting such actions may bring to exact and approximate solutions of POMDPs.

AAAI Conference 2011 Conference Paper

Utilizing Partial Policies for Identifying Equivalence of Behavioral Models

  • Yifeng Zeng
  • Prashant Doshi
  • Yinghui Pan
  • Hua Mao
  • Muthukumaran Chandrasekaran
  • Jian Luo

We present a novel approach for identifying exact and approximate behavioral equivalence between models of agents. This is significant because both decision making and game play in multiagent settings must contend with behavioral models of other agents in order to predict their actions. One approach that reduces the complexity of the model space is to group models that are behaviorally equivalent. Identifying equivalence between models requires solving them and comparing entire policy trees. Because the trees grow exponentially with the horizon, our approach is to focus on partial policy trees for comparison and determining the distance between updated beliefs at the leaves of the trees. We propose a principled way to determine how much of the policy trees to consider, which trades off solution quality for efficiency. We investigate this approach in the context of the interactive dynamic influence diagram and evaluate its performance.

AAMAS Conference 2010 Conference Paper

Modeling Recursive Reasoning by Humans Using Empirically Informed Interactive POMDPs

  • Prashant Doshi
  • Xia Qu
  • Adam Goodie
  • Diana Young

Recursive reasoning of the form {\em what do I think that you think that I think} (and so on) arises often while acting rationally in multiagent settings. Several multiagent decision-making frameworks such as RMM, I-POMDP and the theory of mind model recursive reasoning as integral to an agent's rational choice. Real-world application settings for multiagent decision making are often mixed involving humans and human-controlled agents. In two large experiments, we studied the level of recursive reasoning generally displayed by humans while playing sequential general-sum and fixed-sum, two-player games. Our results show that subjects experiencing a general-sum strategic game display first or second level of recursive thinking with the first level being more prominent. However, if the game is made simpler and more competitive with fixed-sum payoffs, subjects predominantly attributed first-level recursive thinking to opponents thereby acting using second level of reasoning. Subsequently, we model the behavioral data obtained from the studies using the I-POMDP framework, appropriately augmented using well-known human judgment and decision models. Accuracy of the predictions by our models suggest that these could be viable ways for computationally modeling strategic behavioral data.

IJCAI Conference 2009 Conference Paper

  • Yifeng Zeng
  • Prashant Doshi

Interactive dynamic influence diagrams (I-DIDs) are graphical models for sequential decision making in partially observable settings shared by other agents. Algorithms for solving I-DIDs face the challenge of an exponentially growing space of candidate models ascribed to other agents, over time. Previous approach for exactly solving I- DIDs groups together models having similar solutions into behaviorally equivalent classes and updates these classes. We present a new method that, in addition to aggregating behaviorally equivalent models, further groups models that prescribe identical actions at a single time step. We show how to update these augmented classes and prove that our method is exact. The new approach enables us to bound the aggregated model space by the cardinality of other agents’ actions. We evaluate its performance and provide empirical results in support.

AAMAS Conference 2009 Conference Paper

Improved Approximation of Interactive Dynamic Influence Diagrams Using Discriminative Model Updates

  • Prashant Doshi
  • Yifeng Zeng

Interactive dynamic influence diagrams (I-DIDs) are graphical models for sequential decision making in uncertain settings shared by other agents. Algorithms for solving I-DIDs face the challenge of an exponentially growing space of candidate models ascribed to other agents, over time. We formalize the concept of a minimal model set, which facilitates qualitative comparisons between different approximation techniques. We then present a new approximation technique that minimizes the space of candidate models by discriminating between model updates. We empirically demonstrate that our approach improves significantly in performance on the previous clustering based approximation technique.

AAAI Conference 2008 Conference Paper

Generalized Point Based Value Iteration for Interactive POMDPs

  • Prashant Doshi

We develop a point based method for solving finitely nested interactive POMDPs approximately. Analogously to point based value iteration (PBVI) in POMDPs, we maintain a set of belief points and form value functions composed of those value vectors that are optimal at these points. However, as we focus on multiagent settings, the beliefs are nested and computation of the value vectors relies on predicted actions of others. Consequently, we develop a novel interactive generalization of PBVI applicable to multiagent settings.

JAAMAS Journal 2008 Journal Article

Graphical models for interactive POMDPs: representations and solutions

  • Prashant Doshi
  • Yifeng Zeng
  • Qiongyu Chen

Abstract We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (I-POMDPs). The graphical models called interactive influence diagrams (I-IDs) and their dynamic counterparts, interactive dynamic influence diagrams (I-DIDs), seek to explicitly model the structure that is often present in real-world problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. I-DIDs may be used to compute the policy of an agent given its belief as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how I-IDs and I-DIDs may be applied and demonstrate their usefulness. We also show how the models may be solved using the standard algorithms that are applicable to DIDs. Solving I-DIDs exactly involves knowing the solutions of possible models of the other agents. The space of models grows exponentially with the number of time steps. We present a method of solving I-DIDs approximately by limiting the number of other agents’ candidate models at each time step to a constant. We do this by clustering models that are likely to be behaviorally equivalent and selecting a representative set from the clusters. We discuss the error bound of the approximation technique and demonstrate its empirical performance.

AAMAS Conference 2007 Conference Paper

Approximate State Estimation in Multiagent Settings with Continuous or Large Discrete State Spaces

  • Prashant Doshi

We present a new method for carrying out state estimation in multiagent settings that are characterized by continuous or large discrete state spaces. State estimation in multiagent settings involves updating an agent's belief over the physical states and the space of other agents' models. We factor out the models of the other agents and update the agen's belief over these models, as exactly as possible. Simultaneously, we sample particles from the distribution over the large physical state space and project the particles in time.

AAMAS Conference 2007 Conference Paper

Graphical Models for Online Solutions to Interactive POMDPs

  • Prashant Doshi
  • Yifeng Zeng
  • Qiongyu Chen

We develop a new graphical representation for interactive partially observable Markov decision processes (I-POMDPs) that is significantly more transparent and semantically clear than the previous representation. These graphical models called interactive dynamic influence diagrams (I-DIDs) seek to explicitly model the structure that is often present in real-world problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. I-DIDs may be used to compute the policy of an agent online as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how I-DIDs may be applied and demonstrate their usefulness.

AAAI Conference 2007 Conference Paper

Improved State Estimation in Multiagent Settings with Continuous or Large Discrete State Spaces

  • Prashant Doshi

State estimation in multiagent settings involves updating an agent’s belief over the physical states and the space of other agents’ models. Performance of the previous approach to state estimation, the interactive particle filter, degrades with large state spaces because it distributes the particles over both, the physical state space and the other agents’ models. We present an improved method for estimating the state in a class of multiagent settings that are characterized in part by continuous or large discrete state spaces. We factor out the models of the other agents and update the agent’s belief over these models, as exactly as possible. Simultaneously, we sample particles from the distribution over the large physical state space and project the particles in time. This approach is equivalent to Rao-Blackwellising the interactive particle filter. We focus our analysis on the special class of problems where the nested beliefs are represented using Gaussians, the problem dynamics using conditional linear Gaussians (CLGs) and the observation functions using softmax or CLGs. These distributions adequately represent many realistic applications.

AAAI Conference 2006 Conference Paper

Inexact Matching of Ontology Graphs Using Expectation-Maximization

  • Prashant Doshi

We present a new method for mapping ontology schemas that address similar domains. The problem of ontology mapping is crucial since we are witnessing a decentralized development and publication of ontological data. We formulate the problem of inferring a match between two ontologies as a maximum likelihood problem, and solve it using the technique of expectation-maximization (EM). Specifically, we adopt directed graphs as our model for ontologies and use a generalized version of EM to arrive at a mapping between the nodes of the graphs. We exploit the structural and lexical similarity between the graphs, and improve on previous approaches by generating a many-one correspondence between the concept nodes. We provide preliminary experimental results in support of our method and outline its limitations.

AAAI Conference 2006 Conference Paper

On the Difficulty of Achieving Equilibrium in Interactive POMDPs

  • Prashant Doshi

We analyze the asymptotic behavior of agents engaged in an infinite horizon partially observable stochastic game as formalized by the interactive POMDP framework. We show that when agents’ initial beliefs satisfy a truth compatibility condition, their behavior converges to a subjective -equilibrium in a finite time, and subjective equilibrium in the limit. This result is a generalization of a similar result in repeated games, to partially observable stochastic games. However, it turns out that the equilibrating process is difficult to demonstrate computationally because of the difficulty in coming up with initial beliefs that are both natural and satisfy the truth compatibility condition. Our results, therefore, shed some negative light on using equilibria as a solution concept for decision making in partially observable stochastic games.

AAAI Conference 2005 Conference Paper

A Particle Filtering Based Approach to Approximating Interactive POMDPs

  • Prashant Doshi

POMDPs provide a principled framework for sequential planning in single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent’s belief about the physical world, about beliefs of the other agent(s), about their beliefs about others’ beliefs, and so on. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a method for obtaining approximate solutions to I- POMDPs based on particle filtering (PF). We utilize the interactive PF which descends the levels of interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to deal with the belief space complexity, but it does not address the policy space complexity. We provide experimental results and chart future work.