Arrow Research search

Author name cluster

Milind Tambe

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

257 papers
2 author rows

Possible papers

257

JAAMAS Journal 2026 Journal Article

An Automated Teamwork Infrastructure for Heterogeneous Software Agents and Humans

  • David V. Pynadath
  • Milind Tambe

Abstract Agent integration architectures enable a heterogeneous, distributed set of agents to work together to address problems of greater complexity than those addressed by the individual agents themselves. Unfortunately, integrating software agents and humans to perform real-world tasks in a large-scale system remains difficult, especially due to three main challenges: ensuring robust execution in the face of a dynamic environment, providing abstract task specifications without all the low-level coordination details, and finding appropriate agents for inclusion in the overall system. To address these challenges, our Teamcore project provides the integration architecture with general-purpose teamwork coordination capabilities. We make each agent team-ready by providing it with a proxy capable of general teamwork reasoning. Thus, a key novelty and strength of our framework is that powerful teamwork capabilities are built into its foundations by providing the proxies themselves with a teamwork model. Given this teamwork model, the Teamcore proxies addresses the first agent integration challenge, robust execution, by automatically generating the required coordination actions for the agents they represent. We can also exploit the proxies' reusable general teamwork knowledge to address the second agent integration challenge. Through team - oriented programming, a developer specifies a hierarchical organization and its goals and plans, abstracting away from coordination details. Finally, KARMA, our Knowledgeable Agent Resources Manager Assistant, can aid the developer in conquering the third agent integration challenge by locating agents that match the specified organization's requirements. Our integration architecture enables teamwork among agents with no coordination capabilities, and it establishes and automates consistent teamwork among agents with some coordination capabilities. Thus, team-oriented programming provides a level of abstraction that can be used on top of previous approaches to agent-oriented programming. We illustrate how the Teamcore architecture successfully addressed the challenges of agent integration in two application domains: simulated rehearsal of a military evacuation mission and facilitation of human collaboration.

JAAMAS Journal 2026 Journal Article

Automated Assistants for Analyzing Team Behaviors

  • Ranjit Nair
  • Milind Tambe
  • Taylor Raines

Abstract Multi-agent teamwork is critical in a large number of agent applications, including training, education, virtual enterprises and collective robotics. The complex interactions of agents in a team as well as with other agents make it extremely difficult for human developers to understand and analyze agent-team behavior. It has thus become increasingly important to develop tools that can help humans analyze, evaluate, and understand team behaviors. However, the problem of automated team analysis is largely unaddressed in previous work. In this article, we identify several key constraints faced by team analysts. Most fundamentally, multiple types of models of team behavior are necessary to analyze different granularities of team events, including agent actions, interactions, and global performance. In addition, effective ways of presenting the analysis to humans is critical and the presentation techniques depend on the model being presented. Finally, analysis should be independent of underlying team architecture and implementation. We also demonstrate an approach to addressing these constraints by building an automated team analyst called ISAAC for post-hoc, off-line agent-team analysis. ISAAC acquires multiple, heterogeneous team models via machine learning over teams' external behavior traces, where the specific learning techniques are tailored to the particular model learned. Additionally, ISAAC employs multiple presentation techniques that can aid human understanding of the analyses. ISAAC also provides feedback on team improvement in two novel ways: (i) It supports principled “what-if” reasoning about possible agent improvements; (ii) It allows the user to compare different teams based on their patterns of interactions. This paper presents ISAAC's general conceptual framework, motivating its design, as well as its concrete application in two domains: (i) RoboCup Soccer; (ii) software agent teams participating in a simulated evacuation scenario. In the RoboCup domain, ISAAC was used prior to and during the RoboCup '99 tournament, and was awarded the RoboCup Scientific Challenge Award. In the evacuation domain, ISAAC was used to analyze patterns of message exchanges among software agents, illustrating the generality of ISAAC's techniques. We present detailed algorithms and experimental results from ISAAC's application.

JMLR Journal 2026 Journal Article

Contrasting Local and Global Modeling with Machine Learning and Satellite Data: A Case Study Estimating Tree Canopy Height in African Savannas

  • Esther Rolf
  • Lucia Gordon
  • Milind Tambe
  • Andrew Davies

While advances in machine learning with satellite imagery (SatML) are facilitating environmental monitoring at a global scale, developing SatML models that are accurate and useful for local regions remains critical to understanding and acting on an ever-changing planet. As increasing attention and resources are being devoted to training SatML models with global data, it is important to understand when improvements in global models will make it easier to train or fine-tune models that are accurate in specific regions. To explore this question, we design the first study that explicitly contrasts local and global training paradigms for SatML, through a case study of tree canopy height (TCH) mapping in the Karingani Game Reserve, Mozambique. We find that recent advances in global TCH mapping do not necessarily translate to better local modeling abilities in our study region. Specifically, small models trained only with locally-collected data outperform published global TCH maps, and even outperform globally pretrained models that we fine-tune using local data. Analyzing these results further, we identify specific points of conflict and synergy between local and global modeling paradigms that can inform future research toward aligning local and global performance objectives in geospatial machine learning. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2026. ( edit, beta )

JAAMAS Journal 2026 Journal Article

Experiences Acquired in the Design of RoboCup Teams: A Comparison of Two Fielded Teams

  • Stacy Marsella
  • Milind Tambe
  • Ion Muslea

Abstract Increasingly, multi-agent systems are being designed for a variety of complex, dynamic domains. Effective agent interactions in such domains raise some of the most fundamental research challenges for agent-based systems, in teamwork, multi-agent learning and agent modelling. The RoboCup research initiative, particularly the simulation league, has been proposed to pursue such multi-agent research challenges, using the common testbed of simulation soccer. Despite the significant popularity of RoboCup within the research community, general lessons have not often been extracted from participation in RoboCup. This is what we attempt to do here. We have fielded two teams, ISIS97 and ISIS98, in RoboCup competitions. These teams have been in the top four teams in these competitions. We compare the teams, and attempt to analyze and generalize the lessons learned. This analysis reveals several surprises, pointing out lessons for teamwork and for multi-agent learning.

IS Journal 2026 Journal Article

Generative Artificial Intelligence for Social Impact

  • Lingkai Kong
  • Cheol Woo Kim
  • Davin Choo
  • Milind Tambe

Artificial intelligence for social impact has achieved compelling results in public health, conservation, and security, yet scaling these successes remains difficult due to a persistent deployment bottleneck. We characterize this bottleneck through three coupled gaps: observational scarcity resulting from limited or unreliable data, policy synthesis challenges involving combinatorial decisions and nonstationarity, and the friction of human–AI alignment when incorporating tacit expert knowledge and dynamic constraints. We argue that generative AI offers a unified pathway to bridge these gaps. Large language model agents assist in human–AI alignment by translating natural-language guidance into executable objectives and constraints for downstream planners, while diffusion models generate realistic synthetic data and support uncertainty-aware modeling to improve policy robustness and transfer across deployments. Together, these tools enable scalable, adaptable, and human-aligned AI systems for resource optimization in high-stakes settings.

AAAI Conference 2026 Conference Paper

Optimizing Health Coverage in Ethiopia: A Learning-augmented Approach and Persistent Proportionality Under an Online Budget

  • Davin Choo
  • Yohai Trabelsi
  • Fentabil Getnet
  • Samson Warkaye Lamma
  • Wondesen Nigatu
  • Kasahun Sime
  • Lisa Matay
  • Milind Tambe

As part of nationwide efforts aligned with the United Nations' Sustainable Development Goal 3 on Universal Health Coverage, Ethiopia's Ministry of Health is strengthening health posts to expand access to essential healthcare services. However, only a fraction of this health system strengthening effort can be implemented each year due to limited budgets and other competing priorities, thus the need for an optimization framework to guide prioritization across the regions of Ethiopia. In this paper, we develop a tool, Health Access Resource Planner (HARP), based on a principled decision-support optimization framework for sequential facility planning that aims to maximize population coverage under budget uncertainty while satisfying region-specific proportionality targets at every time step. We then propose two algorithms: (i) a learning-augmented approach that improves upon expert recommendations at any single-step; and (ii) a greedy algorithm for multi-step planning, both with strong worst-case approximation estimation. In collaboration with the Ethiopian Public Health Institute and Ministry of Health, we demonstrated the empirical efficacy of our method on three regions across various planning scenarios.

AAAI Conference 2026 Conference Paper

Preference Robustness for DPO with Applications to Public Health

  • Cheol Woo Kim
  • Shresth Verma
  • Mauricio Tec
  • Milind Tambe

We study an LLM fine-tuning task for designing reward functions for sequential resource allocation problems in public health, guided by human preferences expressed in natural language. This setting presents a challenging testbed for alignment due to complex and ambiguous objectives and limited data availability. We propose DPO-PRO, a robust fine-tuning algorithm based on Direct Preference Optimization (DPO), which accounts for uncertainty in the preference distribution using a lightweight Distributionally Robust Optimization (DRO) formulation. Unlike prior DRO-based variants, DPO-PRO focuses solely on uncertainty in preferences, avoiding unnecessary conservatism and incurring negligible computational overhead. We evaluate DPO-PRO on a real-world maternal mobile health program operated by the non-profit organization ARMMAN, as well as on standard alignment benchmarks. Experimental results demonstrate that our method consistently improves robustness to noisy preference signals compared to existing DPO variants. Moreover, DPO-PRO achieves comparable performance to prior self-reflection-based baseline for reward function design, while requiring significantly lower inference-time cost.

JAAMAS Journal 2026 Journal Article

Towards Flexible Teamwork in Persistent Teams: Extended Report

  • Milind Tambe
  • Weixiong Zhang

Abstract Teamwork is a critical capability in multi-agent environments. Many such environments mandate that the agents and agent-teams must be persistent i. e. , exist over long periods of time. Agents in such persistent teams are bound together by their long-term common interests and goals. This paper focuses on flexible teamwork in such persistent teams. Unfortunately, while previous work has investigated flexible teamwork, persistent teams remain unexplored. For flexible teamwork, one promising approach that has emerged is model-based, i. e. , providing agents with general models of teamwork that explicitly specify their commitments in teamwork. Such models enable agents to autonomously reason about coordination. Unfortunately, for persistent teams, such models may lead to coordination and communication actions that while locally optimal, are highly problematic for the team's long-term goals. We present a decision-theoretic technique based on Markov decision processes to enable persistent teams to overcome such limitations of the model-based approach. In particular, agents reason about expected team utilities of future team states that are projected to result from actions recommended by the teamwork model, as well as lower-cost (or higher-cost) variations on these actions. To accommodate real-time constraints, this reasoning is done in an any-time fashion. Implemented examples from an analytic search tree and some real-world domains are presented.

AAAI Conference 2026 Conference Paper

VORTEX: Aligning Task Utility and Human Preferences Through LLM-Guided Reward Shaping

  • Guojun Xiong
  • Milind Tambe

In social impact optimization, AI decision systems often rely on solvers that optimize well-calibrated mathematical objectives. However, these solvers cannot directly accommodate evolving human preferences, typically expressed in natural language rather than formal constraints. Recent approaches address this by using large language models (LLMs) to generate new reward functions from preference descriptions. While flexible, they risk sacrificing the system's core utility guarantees. In this paper, we propose VORTEX, a language-guided reward shaping framework that preserves established optimization goals while adaptively incorporating human feedback. By formalizing the problem as multi-objective optimization, we use LLMs to iteratively generate shaping rewards based on verbal reinforcement and text-gradient prompt updates. This allows stakeholders to steer decision behavior via natural language without modifying solvers or specifying trade-off weights. We provide theoretical guarantees that VORTEX converges to Pareto-optimal trade-offs between utility and preference satisfaction. Empirical results in real-world allocation tasks demonstrate that VORTEX outperforms baselines in satisfying human-aligned coverage goals while maintaining high task performance. This work introduces a practical and theoretically grounded paradigm for human-AI collaborative optimization guided by natural language.

NeurIPS Conference 2025 Conference Paper

Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing

  • Choo, XianJun, Davin
  • Yuqi Pan
  • Tonghan Wang
  • Milind Tambe
  • Alastair van Heerden
  • Cheryl Johnson

We study a sequential decision-making problem on a $n$-node graph $\mathcal{G}$ where each node has an unknown label from a finite set $\mathbf{\Omega}$, drawn from a joint distribution $\mathcal{P}$ that is Markov with respect to $\mathcal{G}$. At each step, selecting a node reveals its label and yields a label-dependent reward. The goal is to adaptively choose nodes to maximize expected accumulated discounted rewards. We impose a frontier exploration constraint, where actions are limited to neighbors of previously selected nodes, reflecting practical constraints in settings such as contact tracing and robotic exploration. We design a Gittins index-based policy that applies to general graphs and is provably optimal when $\mathcal{G}$ is a forest. Our implementation runs in $\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|^2)$ time while using $\mathcal{O}(n \cdot |\mathbf{\Omega}|^2)$ oracle calls to $\mathcal{P}$ and $\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|)$ space. Experiments on synthetic and real-world graphs show that our method consistently outperforms natural baselines, including in non-tree, budget-limited, and undiscounted settings. For example, in HIV testing simulations on real-world sexual interaction networks, our policy detects nearly all positive cases with only half the population tested, substantially outperforming other baselines.

AAMAS Conference 2025 Conference Paper

Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health

  • Arpan Dasgupta
  • Gagan Jain
  • Arun Suggala
  • Karthikeyan Shanmugam
  • Milind Tambe
  • Aparna Taneja

Mobile health (mHealth) programs face a critical challenge in optimizing the timing of automated health information calls to beneficiaries. This challenge has been formulated as a collaborative multiarmed bandit problem, requiring online learning of a low-rank reward matrix. Existing solutions often rely on heuristic combinations of offline matrix completion and exploration strategies. In this work, we propose a principled Bayesian approach using Thompson Sampling for this collaborative bandit problem. Our method leverages prior information through efficient Gibbs sampling for posterior inference over the low-rank matrix factors, enabling faster convergence. We demonstrate significant improvements over stateof-the-art baselines on a real-world dataset from the world’s largest maternal mHealth program. Our approach achieves a 16% reduction in the number of calls compared to existing methods and a 47% reduction compared to the deployed random policy. This efficiency gain translates to a potential increase in program capacity by 0. 5 − 1. 4 million beneficiaries, granting them access to vital antenatal and post-natal care information. Furthermore, we observe a 7% and 29% improvement in beneficiary retention (an extremely hard metric to impact) compared to state-of-the-art and deployed baselines, respectively. Synthetic simulations further demonstrate the superiority of our approach, particularly in low-data regimes and in effectively utilizing prior information. We also provide a theoretical analysis of our algorithm in a special setting using Eluder dimension.

ECAI Conference 2025 Conference Paper

Beyond Listenership: AI-Predicted Interventions Drive Improvements in Maternal Health Behaviours

  • Arpan Dasgupta
  • Sarvesh Gharat
  • Neha Madhiwalla
  • Aparna Hegde
  • Milind Tambe
  • Aparna Taneja

Automated voice calls with health information are a proven method for disseminating maternal and child health information among beneficiaries and are deployed in several programs around the world. However, these programs often suffer from beneficiary dropoffs and poor engagement. In previous work, through real-world trials, we showed that an AI model, specifically a restless bandit model, could identify beneficiaries who would benefit most from live service call interventions, preventing dropoffs and boosting engagement. However, one key question has remained open so far: does such improved listenership via AI-targeted interventions translate into beneficiaries’ improved knowledge and health behaviors? We present a first study that shows not only listenership improvements due to AI interventions, but also simultaneously links these improvements to health behavior changes. Specifically, we demonstrate that AI-scheduled interventions, which enhance listenership, lead to statistically significant improvements in beneficiaries’ health behaviors such as taking iron or calcium supplements in the postnatal period, as well as understanding of critical health topics during pregnancy and infancy. This underscores the potential of AI to drive meaningful improvements in maternal and child health.

NeurIPS Conference 2025 Conference Paper

Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data

  • Lingkai Kong
  • Haichuan Wang
  • Tonghan Wang
  • Guojun Xiong
  • Milind Tambe

Incorporating pre-collected offline data can substantially improve the sample efficiency of reinforcement learning (RL), but its benefits can break down when the transition dynamics in the offline dataset differ from those encountered online. Existing approaches typically mitigate this issue by penalizing or filtering offline transitions in regions with large dynamics gap. However, their dynamics-gap estimators often rely on KL divergence or mutual information, which can be ill-defined when offline and online dynamics have mismatched support. To address this challenge, we propose CompFlow, a principled framework built on the theoretical connection between flow matching and optimal transport. Specifically, we model the online dynamics as a conditional flow built upon the output distribution of a pretrained offline flow, rather than learning it directly from a Gaussian prior. This composite structure provides two advantages: (1) improved generalization when learning online dynamics under limited interaction data, and (2) a well-defined and stable estimate of the dynamics gap via the Wasserstein distance between offline and online transitions. Building on this dynamics-gap estimator, we further develop an optimistic active data collection strategy that prioritizes exploration in high-gap regions, and show theoretically that it reduces the performance gap to the optimal policy. Empirically, CompFlow consistently outperforms strong baselines across a range of RL benchmarks with shifted-dynamics data.

AAAI Conference 2025 Conference Paper

Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits

  • Biyonka Liang
  • Lily Xu
  • Aparna Taneja
  • Milind Tambe
  • Lucas Janson

Public health programs often provide interventions to encourage program adherence, and effectively allocating interventions is vital for producing the greatest overall health outcomes, especially in underserved communities where resources are limited. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs that novelly combines techniques in Bayesian modeling with Thompson sampling to flexibly model the complex RMAB settings present in public health program adherence problems, namely context and non-stationarity. BCoR's key strength is the ability to leverage shared information within and between arms to learn the unknown RMAB transition dynamics quickly in intervention-scarce settings with relatively short time horizons, which is common in public health applications. Empirically, BCoR achieves substantially higher finite-sample performance over a range of experimental settings, including a setting using real-world adherence data that was developed in collaboration with ARMMAN, an NGO in India which runs a large-scale maternal mHealth program, showcasing BCoR practical utility and potential for real-world deployment.

AAAI Conference 2025 Conference Paper

Evaluating Index-based Treatment Allocation in Underresourced Communities

  • Niclas Boehmer
  • Yash Nair
  • Sanket Shah
  • Lucas Janson
  • Aparna Taneja
  • Milind Tambe

In many applications of AI for Social Impact (e.g., when allocating spots in support programs for underserved communities), resources are scarce and an allocation policy is needed to decide who receives a resource. Before being deployed at scale, a rigorous evaluation of an AI-powered allocation policy is vital. In this paper, we introduce the methods necessary to evaluate index-based allocation policies, which allocate a limited number of resources to those who need them the most. Such policies create dependencies between agents, rendering standard statistical tests invalid and ineffective. Addressing the arising practical and technical challenges, we describe an efficient estimator and methods for drawing valid statistical conclusions. Our extensive experiments validate our methodology in practical settings while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial conducted with 10000 beneficiaries for a mHealth program for pregnant women. Our new methodology allows us to draw previously invisible conclusions when comparing two different ML allocation policies.

AAMAS Conference 2025 Conference Paper

Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

  • Guohjun Xiong
  • Haichuan Wang
  • Yuqi Pan
  • Saptarshi Mandal
  • Sanket Shah
  • Niclas Boehmer
  • Milind Tambe

Restless multi-armed bandits (RMABs) have been highly successful in optimizing sequential resource allocation across many domains. However, in many practical settings with highly scarce resources, where each agent can only receive at most one resource, such as healthcare intervention programs, the standard RMAB framework falls short. To tackle such scenarios, we introduce Finite-Horizon Single-Pull RMABs (SPRMABs), a novel variant in which each arm can only be pulled once. This single-pull constraint introduces additional complexity, rendering many existing RMAB solutions suboptimal or ineffective. To address this shortcoming, we propose using dummy states that expand the system and enforce the one-pull constraint. We then design a lightweight index policy for this expanded system. For the first time, we demonstrate that our index policy achieves a sub-linearly decaying average optimality gap of Õ 1 𝜌1/2 for a finite number of arms, where 𝜌 is the scaling factor for each arm cluster. Extensive simulations validate the proposed method, showing robust performance across various domains compared to existing benchmarks.

ICML Conference 2025 Conference Paper

Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning

  • Cheol Woo Kim
  • Jai Moondra
  • Shresth Verma
  • Madeleine Pollack
  • Lingkai Kong
  • Milind Tambe
  • Swati Gupta 0001

In many real-world applications of Reinforcement Learning (RL), deployed policies have varied impacts on different stakeholders, creating challenges in reaching consensus on how to effectively aggregate their preferences. Generalized $p$-means form a widely used class of social welfare functions for this purpose, with broad applications in fair resource allocation, AI alignment, and decision-making. This class includes well-known welfare functions such as Egalitarian, Nash, and Utilitarian welfare. However, selecting the appropriate social welfare function is challenging for decision-makers, as the structure and outcomes of optimal policies can be highly sensitive to the choice of $p$. To address this challenge, we study the concept of an $\alpha$-approximate portfolio in RL, a set of policies that are approximately optimal across the family of generalized $p$-means for all $p \in [-\infty, 1]$. We propose algorithms to compute such portfolios and provide theoretical guarantees on the trade-offs among approximation factor, portfolio size, and computational efficiency. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of our approach in summarizing the policy space induced by varying $p$ values, empowering decision-makers to navigate this landscape more effectively.

AAMAS Conference 2025 Conference Paper

On Diffusion Models for Multi-Agent Partial Observability: Shared Attractors, Error Bounds, and Composite Flow

  • Tonghan Wang
  • Heng Dong
  • Yanchen Jiang
  • David C. Parkes
  • Milind Tambe

Multiagent systems grapple with partial observability (PO), and the decentralized POMDP (Dec-POMDP) model highlights the fundamental nature of this challenge. Whereas recent approaches to addressing PO have appealed to deep learning models, providing a rigorous understanding of how these models and their approximation errors a�ect agents’ handling of PO and their interactions remain a challenge. In addressing this challenge, we investigate reconstructing global states from local action-observation histories in Dec-POMDPs using di�usion models. We� rst� nd that di�usion models conditioned on local history represent possible states as stable� xed points. In collectively observable (CO) Dec-POMDPs, individual di�usion models conditioned on agents’ local histories share a unique� xed point corresponding to the global state, while in non-CO settings, shared� xed points yield a distribution of possible states given joint history. We further� nd that, with deep learning approximation errors, � xed points can deviate from true states and the deviation is negatively correlated to the Jacobian rank. Inspired by this low-rank property, we bound a deviation by constructing a surrogate linear regression model that approximates the local behavior of a di�usion model. With this bound, we propose a composite di�usion process iterating over agents with theoretical convergence guarantees to the true state.

AAAI Conference 2025 System Paper

PRIORITY2REWARD: Incorporating Healthworker Preferences for Resource Allocation Planning

  • Shresth Verma
  • Alayna Nguyen
  • Niclas Boehmer
  • Lingkai Kong
  • Milind Tambe

In this paper, we present PRIORITY2REWARD a Large Language Model (LLM) based application which incorporates health worker preferences for resource allocation planning in public health programs. LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning problems. We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In the context of public health, our approach empowers grassroots health workers to tailor automated allocation decisions to community needs. We showcase a simulated application of PRIORITY2REWARD for a large-scale mobile health program in India. The tool allows health workers to enter natural language preferences and leverages LLMs to search for reward functions aligned with the preference. Our tool then dynamically showcases how the LLM generated reward function modifies the policy outcomes with respect to different demographic groups in the population. This can help inform policy implementation at a community level.

ICLR Conference 2025 Conference Paper

Reinforcement learning with combinatorial actions for coupled restless bandits

  • Lily Xu
  • Bryan Wilder
  • Elias B. Khalil
  • Milind Tambe

Reinforcement learning (RL) has increasingly been applied to solve real-world planning problems, with progress in handling large state spaces and time horizons. However, a key bottleneck in many domains is that RL methods cannot accommodate large, combinatorially structured action spaces. In such settings, even representing the set of feasible actions at a single step may require a complex discrete optimization formulation. We leverage recent advances in embedding trained neural networks into optimization problems to propose SEQUOIA, an RL algorithm that directly optimizes for long-term reward over the feasible action space. Our approach embeds a Q-network into a mixed-integer program to select a combinatorial action in each timestep. Here, we focus on planning over restless bandits, a class of planning problems which capture many real-world examples of sequential decision making. We introduce coRMAB, a broader class of restless bandits with combinatorial actions that cannot be decoupled across the arms of the restless bandit, requiring direct solving over the joint, exponentially large action space. We empirically validate SEQUOIA on four novel restless bandit problems with combinatorial constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints. Our approach significantly outperforms existing methods—which cannot address sequential planning and combinatorial selection simultaneously—by an average of 24.8% on these difficult instances.

UAI Conference 2025 Conference Paper

Robust Optimization with Diffusion Models for Green Security

  • Lingkai Kong
  • Haichuan Wang
  • Yuqi Pan
  • Cheol Woo Kim
  • Mingxiao Song
  • Alayna Nguyen
  • Tonghan Wang 0001
  • Haifeng Xu

In green security, defenders must forecast adversarial behavior-such as poaching, illegal logging, and illegal fishing-to plan effective patrols. These behavior are often highly uncertain and complex. Prior work has leveraged game theory to design robust patrol strategies to handle uncertainty, but existing adversarial behavior models primarily rely on Gaussian processes or linear models, which lack the expressiveness needed to capture intricate behavioral patterns. To address this limitation, we propose a conditional diffusion model for adversary behavior modeling, leveraging its strong distribution-fitting capabilities. To the best of our knowledge, this is the first application of diffusion models in the green security domain. Integrating diffusion models into game-theoretic optimization, however, presents new challenges, including a constrained mixed strategy space and the need to sample from an unnormalized distribution to estimate utilities. To tackle these challenges, we introduce a mixed strategy of mixed strategies and employ a twisted Sequential Monte Carlo (SMC) sampler for accurate sampling. Theoretically, our algorithm is guaranteed to converge to an \(\epsilon\)-equilibrium with high probability using a finite number of iterations and samples. Empirically, we evaluate our approach on both synthetic and real-world poaching datasets, demonstrating its effectiveness.

AAAI Conference 2025 Conference Paper

The Bandit Whisperer: Communication Learning for Restless Bandits

  • Yunfan Zhao
  • Tonghan Wang
  • Dheeraj Mysore Nagaraj
  • Aparna Taneja
  • Milind Tambe

Applying Reinforcement Learning (RL) to Restless Multi-Arm Bandits (RMABs) offers a promising avenue for addressing allocation problems with resource constraints and temporal dynamics. However, classic RMAB models largely overlook the challenges of (systematic) data errors - a common occurrence in real-world scenarios due to factors like varying data collection protocols and intentional noise for differential privacy. We demonstrate that conventional RL algorithms used to train RMABs can struggle to perform well in such settings. To solve this problem, we propose the first communication learning approach in RMABs, where we study which arms, when involved in communication, are most effective in mitigating the influence of such systematic data errors. In our setup, the arms receive Q-function parameters from similar arms as messages to guide behavioral policies, steering Q-function updates. We learn communication strategies by considering the joint utility of messages across all pairs of arms and using a Q-network architecture that decomposes the joint utility. Both theoretical and empirical evidence validate the effectiveness of our method in significantly improving RMAB performance across diverse problems.

IS Journal 2025 Journal Article

The Next Wave of AI for Social Impact: Challenges and Opportunities

  • Milind Tambe
  • Fei Fang
  • Andrew Perrault
  • Bryan Wilder

The burgeoning field of artificial intelligence for social impact (AI4SI) represents a significant evolution in artificial intelligence, prioritizing measurable positive impact for vulnerable and under-resourced populations. This article examines the historical context and recent surge in AI4SI, driven by technological advancements and a growing awareness of societal challenges. It highlights the crucial role of interdisciplinary collaboration, ethical considerations, and the potential of emerging AI trends in addressing issues such as poverty, health, and environmental sustainability. Furthermore, the article delves into key research questions and challenges facing the field, including the need for contextually relevant AI design, overcoming data limitations, ensuring scalable and sustainable deployments in resource-constrained environments, and establishing robust evaluation frameworks. Realizing the full potential of AI to address pressing societal needs in the coming decade and beyond will hinge on effectively navigating these challenges and fostering a deeply impact-driven approach to research and development.

AAMAS Conference 2025 Conference Paper

Towards Foundation-model-based Multiagent System to Accelerate AI for Social Impact

  • Yunfan Zhao
  • Niclas Boehmer
  • Aparna Taneja
  • Milind Tambe

AI for social impact (AI4SI) offers significant potential for addressing complex societal challenges in areas such as public health, agriculture, education, conservation, and public safety. However, existing AI4SI research is often labor-intensive and resource-demanding, limiting its accessibility and scalability; the standard approach is to design a (base-level) system tailored to a specific AI4SI problem. We propose the development of a novel meta-level multi-agent system designed to accelerate the development of such base-level systems, thereby reducing the computational cost and the burden on social impact domain experts and AI researchers. Leveraging advancements in foundation models and large language models, our proposed approach focuses on resource allocation problems providing help across the full AI4SI pipeline from problem formulation over solution design to impact evaluation. We highlight the ethical considerations and challenges inherent in deploying such systems and emphasize the importance of a human-in-the-loop approach to ensure the responsible and effective application of AI systems.

UAI Conference 2025 Conference Paper

What is the Right Notion of Distance between Predict-then-Optimize Tasks?

  • Paula Rodriguez Diaz
  • Lingkai Kong
  • Kai Wang 0040
  • David Alvarez-Melis
  • Milind Tambe

Comparing datasets is a fundamental task in machine learning, essential for various learning paradigms-from evaluating train and test datasets for model generalization to using dataset similarity for detecting data drift. While traditional notions of dataset distances offer principled measures of similarity, their utility has largely been assessed through prediction error minimization. However, in Predict-then-Optimize (PtO) frameworks, where predictions serve as inputs for downstream optimization tasks, model performance is measured through decision regret rather than prediction error. In this work, we propose OTD$^3$ (Optimal Transport Decision-aware Dataset Distance), a novel dataset distance that incorporates downstream decisions in addition to features and labels. We show that traditional feature-label distances lack informativeness in PtO settings, while OTD$^3$ more effectively captures adaptation success. We also derive a PtO-specific adaptation bound based on this distance. Empirically, we show that our proposed distance accurately predicts model transferability across three different PtO tasks from the literature. Code is available at https: //github. com/paularodr/OTD3

NeurIPS Conference 2024 Conference Paper

A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health

  • Nikhil Behari
  • Edwin Zhang
  • Yunfan Zhao
  • Aparna Taneja
  • Dheeraj Nagaraj
  • Milind Tambe

Restless multi-armed bandits (RMAB) have demonstrated success in optimizing resource allocation for large beneficiary populations in public health settings. Unfortunately, RMAB models lack flexibility to adapt to evolving public health policy priorities. Concurrently, Large Language Models (LLMs) have emerged as adept automated planners across domains of robotic control and navigation. In this paper, we propose a Decision Language Model (DLM) for RMABs, enabling dynamic fine-tuning of RMAB policies in public health settings using human-language commands. We propose using LLMs as automated planners to (1) interpret human policy preference prompts, (2) propose reward functions as code for a multi-agent RMAB environment, and (3) iterate on the generated reward functions using feedback from grounded RMAB simulations. We illustrate the application of DLM in collaboration with ARMMAN, an India-based non-profit promoting preventative care for pregnant mothers, that currently relies on RMAB policies to optimally allocate health worker calls to low-resource populations. We conduct a technology demonstration in simulation using the Gemini Pro model, showing DLM can dynamically shape policy outcomes using only human prompts as input.

ECAI Conference 2024 Conference Paper

Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents

  • Lucia Gordon
  • Esther Rolf
  • Milind Tambe

Stochastic multi-agent multi-armed bandits typically assume that the rewards from each arm follow a fixed distribution, regardless of which agent pulls the arm. However, in many real-world settings, rewards can depend on the sensitivity of each agent to their environment. In medical screening, disease detection rates can vary by test type; in preference matching, rewards can depend on user preferences; and in environmental sensing, observation quality can vary across sensors. Since past work does not specify how to allocate agents of heterogeneous but known sensitivity of these types in a stochastic bandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates information from diverse agents. In doing so, we address the joint challenges of (i) aggregating the rewards, which follow different distributions for each agent-arm pair, and (ii) coordinating the assignments of agents to arms. Min-Width facilitates efficient collaboration among heterogeneous agents, exploiting the known structure in the agents’ reward functions to weight their rewards accordingly. We analyze the regret of Min-Width and conduct pseudo-synthetic and fully synthetic experiments to study the performance of different levels of information sharing. Our results confirm that the gains to modeling agent heterogeneity tend to be greater when the sensitivities are more varied across agents, while combining more information does not always improve performance.

AAMAS Conference 2024 Conference Paper

Efficient Public Health Intervention Planning Using Decomposition-Based Decision-focused Learning

  • Sanket Shah
  • Arun Suggala
  • Milind Tambe
  • Aparna Taneja

The declining participation of beneficiaries over time is a key concern in public health programs. A popular strategy for improving retention is to have health workers ‘intervene’ on beneficiaries at risk of dropping out. However, the availability and time of these health workers are limited resources. As a result, there has been a line of research on optimizing these limited intervention resources using Restless Multi-Armed Bandits (RMABs). The key technical barrier to using this framework in practice lies in the need to estimate the beneficiaries’ RMAB parameters from historical data. Recent research has shown that Decision-Focused Learning (DFL), which focuses on maximizing the beneficiaries’ adherence rather than predictive accuracy, improves the performance of intervention targeting using RMABs. Unfortunately, these gains come at a high computational cost because of the need to solve and evaluate the RMAB in each DFL training step. In this paper, we provide a principled way to exploit the structure of RMABs to speed up intervention planning by cleverly decoupling the planning for different beneficiaries. We use real-world data from an Indian NGO, ARMMAN, to show that our approach is up to two orders of magnitude faster than the state-of-the-art approach while also yielding superior model performance. This would enable the NGO to scale up deployments using DFL to potentially millions of mothers, ultimately advancing progress toward UNSDG 3. 1.

ECAI Conference 2024 Conference Paper

Escape Sensing Games: Detection-vs-Evasion in Security Applications

  • Niclas Boehmer
  • Minbiao Han
  • Haifeng Xu
  • Milind Tambe

Traditional game-theoretic research for security applications primarily focuses on the allocation of external protection resources to defend targets. This work puts forward the study of a new class of games centered around strategically arranging targets to protect them against a constrained adversary, with motivations from varied domains such as peacekeeping resource transit and cybersecurity. Specifically, we introduce Escape Sensing Games (ESGs). In ESGs, a blue player manages the order in which targets pass through a channel, while her opponent tries to capture the targets using a set of sensors that need some time to recharge after each activation. We present a thorough computational study of ESGs. Among others, we show that it is NP-hard to compute best responses and equilibria. Nevertheless, we propose a variety of effective (heuristic) algorithms whose quality we demonstrate in extensive computational experiments.

UAI Conference 2024 Conference Paper

Group Fairness in Predict-Then-Optimize Settings for Restless Bandits

  • Shresth Verma
  • Yunfan Zhao
  • Sanket Shah
  • Niclas Boehmer
  • Aparna Taneja
  • Milind Tambe

Restless multi-arm bandits (RMABs) are a model for sequentially allocating a limited number of resources to agents modeled as Markov Decision Processes. RMABs have applications in cellular networks, anti-poaching, and in particular, healthcare. For such high-stakes use cases, allocations are often required to treat different groups of agents (e. g. , defined by sensitive attributes) fairly. In addition to the fairness challenge, agents’ transition probabilities are often unknown and need to be learned in real-world problems. Thus, group fairness in RMABs requires us to simultaneously learn transition probabilities and how much budget we allocate to each group. Overcoming this key challenge ignored by previous work, we develop a decision-focused-learning pipeline to solve equitable RMABs, using a novel budget allocation algorithm to prevent disparity between groups. Our results on both synthetic and real-world large-scale datasets demonstrate that incorporating fair planning into the learning step greatly improves equity with little sacrifice in utility.

AAMAS Conference 2024 Conference Paper

Improving Mobile Maternal and Child Health Care Programs: Collaborative Bandits for Time Slot Selection

  • Soumyabrata Pal
  • Milind Tambe
  • Arun Suggala
  • Karthikeyan Shanmugam
  • Aparna Taneja

Maternal and child health is a global priority, reflected in the UN Sustainable Development Goal 3. 1. Mobile health (mHealth) programs, using automated voice messages, are a vital tool for NGOs to disseminate health information in underserved communities. However, these programs face challenges: limited beneficiary phone access and unknown time preferences hinder timely outreach, leading to poor engagement. We address this by formulating the time preference inference problem as a multi-agent multi-armed bandit optimization problem, where beneficiaries are modeled as agents, and time slots as arms. We introduce a novel online collaborative filtering framework that infers preferred time slots by collaborating across beneficiaries to quickly identify their preferred time slots. To highlight the scope and impact of this problem, we are working with Kilkari, the world’s largest maternal and child mHealth program serving millions in India every week. Kilkari faces substantial reattempt costs to improve call answer rates. Through extensive experiments on real-world data obtained from Kilkari, we demonstrate that our collaborative bandit framework significantly outperforms both existing policies used by the NGO, and popular non-collaborative bandit algorithms (e. g. , Upper Confidence Bound), both in terms of number of call retries, saving critical bandwidth that enables wider outreach, and by rapidly learning optimal time slots, improving beneficiary engagement and retention.

AAAI Conference 2024 Conference Paper

Leaving the Nest: Going beyond Local Loss Functions for Predict-Then-Optimize

  • Sanket Shah
  • Bryan Wilder
  • Andrew Perrault
  • Milind Tambe

Predict-then-Optimize is a framework for using machine learning to perform decision-making under uncertainty. The central research question it asks is, "How can we use the structure of a decision-making task to tailor ML models for that specific task?" To this end, recent work has proposed learning task-specific loss functions that capture this underlying structure. However, current approaches make restrictive assumptions about the form of these losses and their impact on ML model behavior. These assumptions both lead to approaches with high computational cost, and when they are violated in practice, poor performance. In this paper, we propose solutions to these issues, avoiding the aforementioned assumptions and utilizing the ML model's features to increase the sample efficiency of learning loss functions. We empirically show that our method achieves state-of-the-art results in four domains from the literature, often requiring an order of magnitude fewer samples than comparable methods from past work. Moreover, our approach outperforms the best existing method by nearly 200% when the localness assumption is broken.

ICML Conference 2024 Conference Paper

Position: Application-Driven Innovation in Machine Learning

  • David Rolnick
  • Alán Aspuru-Guzik
  • Sara Beery
  • Bistra Dilkina
  • Priya L. Donti
  • Marzyeh Ghassemi
  • Hannah Kerner
  • Claire Monteleoni

In this position paper, we argue that application-driven research has been systemically under-valued in the machine learning community. As applications of machine learning proliferate, innovative algorithms inspired by specific real-world challenges have become increasingly important. Such work offers the potential for significant impact not merely in domains of application but also in machine learning itself. In this paper, we describe the paradigm of application-driven research in machine learning, contrasting it with the more standard paradigm of methods-driven research. We illustrate the benefits of application-driven machine learning and how this approach can productively synergize with methods-driven work. Despite these benefits, we find that reviewing, hiring, and teaching practices in machine learning often hold back application-driven innovation. We outline how these processes may be improved.

ICML Conference 2024 Conference Paper

Position: Social Environment Design Should be Further Developed for AI-based Policy-Making

  • Edwin Zhang
  • Sadie Zhao
  • Tonghan Wang 0001
  • Safwan Hossain
  • Henry Gasztowtt
  • Stephan Zheng
  • David C. Parkes
  • Milind Tambe

Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI in automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The framework seeks to capture general economic environments, includes voting on policy objectives, and gives a direction for the systematic analysis of government and economic policy through AI simulation. We highlight key open problems for future research in AI-based policymaking. By solving these challenges, we hope to achieve various social welfare objectives, thereby promoting more ethical and responsible decision making.

IJCAI Conference 2024 Conference Paper

Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

  • Yunfan Zhao
  • Nikhil Behari
  • Edward Hughes
  • Edwin Zhang
  • Dheeraj Nagaraj
  • Karl Tuyls
  • Aparna Taneja
  • Milind Tambe

Restless multi-arm bandits (RMABs) is a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching. We explore several important question such as how to handle arms opting-in and opting-out over time without frequent retraining from scratch, how to deal with continuous state settings with nonlinear reward functions, which appear naturally in practical contexts. We address these questions by developing a pre-trained model (PreFeRMAB) based on a novel combination of three key ideas: (i) to enable fast generalization, we use train agents to learn from each other's experience; (ii) to accommodate streaming RMABs, we derive a new update rule for a crucial $\lambda$-network; (iii) to handle more complex continuous state settings, we design the algorithm to automatically define an abstract state based on raw observation and reward data. PreFeRMAB allows general zero-shot ability on previously unseen RMABs, and can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. We theoretically prove the benefits of multi-arm generalization and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.

AAMAS Conference 2024 Conference Paper

Towards Zero Shot Learning in Restless Multi-armed Bandits

  • Yunfan Zhao
  • Nikhil Behari
  • Edward Hughes
  • Edwin Zhang
  • Dheeraj Nagaraj
  • Karl Tuyls
  • Aparna Taneja
  • Milind Tambe

Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e. g. , it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We propose a neural network-based pre-trained model that has general zero-shot ability on a wide range of previously unseen RMABs.

NeurIPS Conference 2024 Conference Paper

Transcendence: Generative Models Can Outperform The Experts That Train Them

  • Edwin Zhang
  • Vincent Zhu
  • Naomi Saphra
  • Anat Kleiman
  • Benjamin L. Edelman
  • Milind Tambe
  • Sham Kakade
  • Eran Malach

Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outperform the humans on their original objectives. In this work, we study the phenomenon of transcendence: when a generative model achieves capabilities that surpass the abilities of the experts generating its data. We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset. We theoretically prove that transcendence is enabled by low-temperature sampling, and rigorously assess this experimentally. Finally, we discuss other sources of transcendence, laying the groundwork for future investigation of this phenomenon in a broader setting.

AAMAS Conference 2023 Conference Paper

A Learning Approach to Complex Contagion Influence Maximization

  • Haipeng Chen
  • Bryan Wilder
  • Wei Qiu
  • Bo An
  • Eric Rice
  • Milind Tambe

Influence maximization (IM) aims to find a set of seed nodes in a social network that maximizes the influence spread. While most IM problems focus on classical influence cascades (e. g. , Independent Cascade and Linear Threshold) which assume individual influence cascade probability is independent of the number of neighbors, recent studies by sociologists show that many influence cascades follow a pattern called complex contagion (CC), where influence cascade probability is much higher when more neighbors are influenced. Nonetheless, there are very limited studies on complex contagion influence maximization (CCIM) problems. This is partly because CC is non-submodular, the solution of which has been an open challenge. In this study, we propose the first reinforcement learning (RL) approach to CCIM. We find that a key obstacle in applying existing RL approaches to CCIM is the reward sparseness issue, which comes from two distinct sources. We then design a new RL algorithm that uses the CCIM problem structure to address the issue. Empirical results show that our approach achieves the state-of-the-art performance on four real-world networks.

AAMAS Conference 2023 Conference Paper

AI-driven Prices for Externalities and Sustainability in Production Markets

  • Panayiotis Danassis
  • Aris Filos-Ratsikas
  • Haipeng Chen
  • Milind Tambe
  • Boi Faltings

Markets do not account for negative externalities; indirect costs that some participants impose on others, such as the cost of overappropriating a common-pool resource (which diminishes future stock, and thus harvest, for everyone). Quantifying appropriate interventions to market prices has proven to be quite challenging. We propose a practical approach to computing market prices and allocations via a deep reinforcement learning policymaker agent, operating in an environment of other learning agents. Our policymaker allows us to tune the prices with regard to diverse objectives such as sustainability and resource wastefulness, fairness, buyers’ and sellers’ welfare, etc. As a highlight of our findings, our policymaker is significantly more successful in maintaining resource sustainability, compared to the market equilibrium outcome, in scarce resource environments.

IJCAI Conference 2023 Conference Paper

Complex Contagion Influence Maximization: A Reinforcement Learning Approach

  • Haipeng Chen
  • Bryan Wilder
  • Wei Qiu
  • Bo An
  • Eric Rice
  • Milind Tambe

In influence maximization (IM), the goal is to find a set of seed nodes in a social network that maximizes the influence spread. While most IM problems focus on classical influence cascades (e. g. , Independent Cascade and Linear Threshold) which assume individual influence cascade probability is independent of the number of neighbors, recent studies by sociologists show that many influence cascades follow a pattern called complex contagion (CC), where influence cascade probability is much higher when more neighbors are influenced. Nonetheless, there are very limited studies for complex contagion influence maximization (CCIM) problems. This is partly because CC is non-submodular, the solution of which has been an open challenge. In this study, we propose the first reinforcement learning (RL) approach to CCIM. We find that a key obstacle in applying existing RL approaches to CCIM is the reward sparseness issue, which comes from two distinct sources. We then design a new RL algorithm that uses the CCIM problem structure to address the issue. Empirical results show that our approach achieves the state-of-the-art performance on 9 real-world networks.

AAMAS Conference 2023 Conference Paper

Fairness for Workers Who Pull the Arms: An Index Based Policy for Allocation of Restless Bandit Tasks

  • Arpita Biswas
  • Jackson A. Killian
  • Paula Rodriguez Diaz
  • Susobhan Ghosh
  • Milind Tambe

Motivated by applications such as machine repair, project monitoring, and anti-poaching patrol scheduling, we study intervention planning of stochastic processes under resource constraints. This planning problem has previously been modeled as restless multi-armed bandits (RMAB), where each arm is an interventiondependent Markov Decision Process. However, the existing literature assumes all intervention resources belong to a single uniform pool, limiting their applicability to real-world settings where interventions are carried out by a set of workers, each with their own costs, budgets, and intervention effects. In this work, we consider a novel RMAB setting, called multi-worker restless bandits (MWRMAB) with heterogeneous workers. The goal is to plan an intervention schedule that maximizes the expected reward while satisfying budget constraints on each worker as well as fairness in terms of the load assigned to each worker. Our contributions are two-fold: (1) we provide a multi-worker extension of the Whittle index to tackle heterogeneous costs and per-worker budget and (2) we develop an index-based scheduling policy to achieve fairness. Further, we evaluate our method on various cost structures and show that our method significantly outperforms other baselines in terms of fairness without sacrificing much in reward accumulated.

IJCAI Conference 2023 Conference Paper

Find Rhinos without Finding Rhinos: Active Learning with Multimodal Imagery of South African Rhino Habitats

  • Lucia Gordon
  • Nikhil Behari
  • Samuel Collier
  • Elizabeth Bondi-Kelly
  • Jackson A. Killian
  • Catherine Ressijac
  • Peter Boucher
  • Andrew Davies

Much of Earth's charismatic megafauna is endangered by human activities, particularly the rhino, which is at risk of extinction due to the poaching crisis in Africa. Monitoring rhinos' movement is crucial to their protection but has unfortunately proven difficult because rhinos are elusive. Therefore, instead of tracking rhinos, we propose the novel approach of mapping communal defecation sites, called middens, which give information about rhinos' spatial behavior valuable to anti-poaching, management, and reintroduction efforts. This paper provides the first-ever mapping of rhino midden locations by building classifiers to detect them using remotely sensed thermal, RGB, and LiDAR imagery in passive and active learning settings. As existing active learning methods perform poorly due to the extreme class imbalance in our dataset, we design MultimodAL, an active learning system employing a ranking technique and multimodality to achieve competitive performance with passive learning models with 94% fewer labels. Our methods could therefore save over 76 hours in labeling time when used on a similarly-sized dataset. Unexpectedly, our midden map reveals that rhino middens are not randomly distributed throughout the landscape; rather, they are clustered. Consequently, rangers should be targeted at areas with high midden densities to strengthen anti-poaching efforts, in line with UN Target 15. 7.

AAAI Conference 2023 Conference Paper

Flexible Budgets in Restless Bandits: A Primal-Dual Algorithm for Efficient Budget Allocation

  • Paula Rodriguez Diaz
  • Jackson A. Killian
  • Lily Xu
  • Arun Sai Suggala
  • Aparna Taneja
  • Milind Tambe

Restless multi-armed bandits (RMABs) are an important model to optimize allocation of limited resources in sequential decision-making settings. Typical RMABs assume the budget --- the number of arms pulled --- to be fixed for each step in the planning horizon. However, for realistic real-world planning, resources are not necessarily limited at each planning step; we may be able to distribute surplus resources in one round to an earlier or later round. In real-world planning settings, this flexibility in budget is often constrained to within a subset of consecutive planning steps, e.g., weekly planning of a monthly budget. In this paper we define a general class of RMABs with flexible budget, which we term F-RMABs, and provide an algorithm to optimally solve for them. We derive a min-max formulation to find optimal policies for F-RMABs and leverage gradient primal-dual algorithms to solve for reward-maximizing policies with flexible budgets. We introduce a scheme to sample expected gradients to apply primal-dual algorithms to the F-RMAB setting and make an otherwise computationally expensive approach tractable. Additionally, we provide heuristics that trade off solution quality for efficiency and present experimental comparisons of different F-RMAB solution approaches.

ICML Conference 2023 Conference Paper

Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation

  • Aditya Mate
  • Bryan Wilder
  • Aparna Taneja
  • Milind Tambe

We consider the task of evaluating policies of algorithmic resource allocation through randomized controlled trials (RCTs). Such policies are tasked with optimizing the utilization of limited intervention resources, with the goal of maximizing the benefits derived. Evaluation of such allocation policies through RCTs proves difficult, notwithstanding the scale of the trial, because the individuals’ outcomes are inextricably interlinked through resource constraints controlling the policy decisions. Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We identify conditions under which such reassignments are permissible and can be leveraged to construct counterfactual trials, whose outcomes can be accurately ascertained, for free. We prove theoretically that such an estimator is more accurate than common estimators based on sample means – we show that it returns an unbiased estimate and simultaneously reduces variance. We demonstrate the value of our approach through empirical experiments on synthetic, semisynthetic as well as real case study data and show improved estimation accuracy across the board.

AAMAS Conference 2023 Conference Paper

Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless Bandits

  • Abheek Ghosh
  • Dheeraj Nagaraj
  • Manish Jain
  • Milind Tambe

We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. This is a popular model for multiagent systems with applications like multi-channel communication, monitoring and machine maintenance tasks, and healthcare. Whittle index policies, which are based on Lagrangian relaxations, are widely used in these settings due to their simplicity and nearoptimality under certain conditions. In this work, we first show that Whittle index policies can fail in simple and practically relevant RMAB settings, even when the RMABs are indexable. We discuss why the optimality guarantees fail and why asymptotic optimality may not translate well to practically relevant planning horizons. We then propose an alternate planning algorithm based on the mean-field method, which can provably and efficiently obtain nearoptimal policies with a large number of arms, without the stringent structural assumptions required by the Whittle index policies. This borrows ideas from existing research with some improvements: our approach is hyper-parameter free, and we provide an improved nonasymptotic analysis which has: (a) no requirement for exogenous hyper-parameters and tighter polynomial dependence on known problem parameters; (b) high probability bounds which show that the reward of the policy is reliable; and (c) matching sub-optimality lower bounds for this algorithm with respect to the number of arms, thus demonstrating the tightness of our bounds. Our extensive experimental analysis shows that the mean-field approach matches or outperforms other baselines.

IJCAI Conference 2023 Conference Paper

Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare

  • Panayiotis Danassis
  • Shresth Verma
  • Jackson A. Killian
  • Aparna Taneja
  • Milind Tambe

The success of many healthcare programs depends on participants' adherence. We consider the problem of scheduling interventions in low resource settings (e. g. , placing timely support calls from health workers) to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. Nevertheless, all past RMAB approaches assume that the participants' behaviour follows the Markov property. We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN. Moreover, we extend RMABs to continuous state spaces, a previously understudied area. To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to learn complex patterns and dynamics to predict future states, and (iii) propose the Time-series Arm Ranking Index (TARI) policy, a novel algorithm that selects the RMAB arms that will benefit the most from an intervention, given our future state predictions. We evaluate our approach on both synthetic data, and a secondary analysis on real data from ARMMAN, and demonstrate significant increase in engagement compared to the SOTA, deployed Whittle index solution. This translates to 16. 3 hours of additional content listened, 90. 8% more engagement drops prevented, and reaching more than twice as many high dropout-risk beneficiaries.

AAMAS Conference 2023 Conference Paper

Modeling Robustness in Decision-Focused Learning as a Stackelberg Game

  • Sonja Johnson-Yu
  • Kai Wang
  • Jessie Finocchiaro
  • Aparna Taneja
  • Milind Tambe

Predict-then-optimize is a common paradigm for optimization tasks situated in incomplete informational settings, in which an agent estimates missing parameters and then optimizes over these predicted parameters. One proposed improvement to this predict-thenoptimize framework is decision-focused learning, which establishes an end-to-end learning pipeline, allowing a predictive model to be tailored to the particular optimization task. The behavior of this predict-then-optimize framework in the presence of noise, however, is not well-understood. This is problematic because many data collection and annotation systems are inherently noisy, and the introduction of such noise could lead to poor downstream optimization. In this work, we aim to present results on robustness to label noise in decision-focused learning and traditional predictthen-optimize tasks using a Stackelberg game as the underlying framework of explanation. Our results suggest that playing the Stackelberg game in anticipation of label noise yields robustness in the predict-then-optimize framework at large, and that the optimal decision-focused learning Stackelberg solution continues to outperform the optimal traditional predict-then-optimize Stackelberg solution.

AAAI Conference 2023 Conference Paper

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

  • Kai Wang
  • Lily Xu
  • Aparna Taneja
  • Milind Tambe

Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm based on the Whittle index policy, using an upper confidence bound (UCB) approach to learn transition dynamics. Specifically, we estimate confidence bounds of the transition probabilities and formulate a bilinear program to compute optimistic Whittle indices using these estimates. Our algorithm, UCWhittle, achieves sublinear O(H \sqrt{T log T}) frequentist regret to solve RMABs with unknown transitions in T episodes with a constant horizon H. Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset.

AAMAS Conference 2023 Conference Paper

Restless Multi-Armed Bandits for Maternal and Child Health: Results from Decision-Focused Learning

  • Shresth Verma
  • Aditya Mate
  • Kai Wang
  • Neha Madhiwalla
  • Aparna Hegde
  • Aparna Taneja
  • Milind Tambe

Mobile Health Awareness programs in underserved communities often suffer from diminishing engagement over time and health workers have to make live service calls to encourage beneficiaries’ participation. Owing to health workers’ limited availability, we consider the optimization problem of scheduling live service calls in a Maternal and Child Health Awareness Program and model it using Restless Multi-Armed Bandits (RMAB). Since the parameters of the RMAB formulation are unknown, a model is learnt to first predict the parameters of the RMAB problem, which is subsequently solved using the Whittle Index algorithm. However, this Predict-then-Optimize framework maximises for the predictive accuracy rather than the quality of the final solution. Decision Focused Learning (DFL) solves this mismatch by integrating the optimization problem in the learning pipeline. Previous works have only shown the applicability of DFL in simulation setting. In collaboration with an NGO, we conduct a large-scale field study consisting of 9000 beneficiaries for 6 weeks and track key engagement metrics in a mobile health awareness program. To the best of our knowledge this is the first real-world study involving Decision Focused Learning. We demonstrate that beneficiaries in the DFL group experience statistically significant reductions in cumulative engagement drop, while those in the Predict-then-Optimize group do not. This establishes the practicality of use of decision focused learning for real world problems. We also demonstrate that DFL learns a better decision boundary between the RMAB actions, and strategically predicts parameters for arms which contribute most to the final decision outcome.

AAAI Conference 2023 Conference Paper

Robust Planning over Restless Groups: Engagement Interventions for a Large-Scale Maternal Telehealth Program

  • Jackson A. Killian
  • Arpita Biswas
  • Lily Xu
  • Shresth Verma
  • Vineet Nair
  • Aparna Taneja
  • Aparna Hegde
  • Neha Madhiwalla

In 2020, maternal mortality in India was estimated to be as high as 130 deaths per 100K live births, nearly twice the UN's target. To improve health outcomes, the non-profit ARMMAN sends automated voice messages to expecting and new mothers across India. However, 38% of mothers stop listening to these calls, missing critical preventative care information. To improve engagement, ARMMAN employs health workers to intervene by making service calls, but workers can only call a fraction of the 100K enrolled mothers. Partnering with ARMMAN, we model the problem of allocating limited interventions across mothers as a restless multi-armed bandit (RMAB), where the realities of large scale and model uncertainty present key new technical challenges. We address these with GROUPS, a double oracle–based algorithm for robust planning in RMABs with scalable grouped arms. Robustness over grouped arms requires several methodological advances. First, to adversarially select stochastic group dynamics, we develop a new method to optimize Whittle indices over transition probability intervals. Second, to learn group-level RMAB policy best responses to these adversarial environments, we introduce a weighted index heuristic. Third, we prove a key theoretical result that planning over grouped arms achieves the same minimax regret--optimal strategy as planning over individual arms, under a technical condition. Finally, using real-world data from ARMMAN, we show that GROUPS produces robust policies that reduce minimax regret by up to 50%, halving the number of preventable missed voice messages to connect more mothers with life-saving maternal health information.

AAAI Conference 2023 Conference Paper

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

  • Kai Wang
  • Shresth Verma
  • Aditya Mate
  • Sanket Shah
  • Aparna Taneja
  • Neha Madhiwalla
  • Aparna Hegde
  • Milind Tambe

This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i) we establish differentiability of the Whittle index policy to support decision-focused learning; (ii) we significantly improve the scalability of decision-focused learning approaches in sequential problems, specifically RMAB problems; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Indeed, our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes.

IJCAI Conference 2022 Conference Paper

ADVISER: AI-Driven Vaccination Intervention Optimiser for Increasing Vaccine Uptake in Nigeria

  • Vineet Nair
  • Kritika Prakash
  • Michael Wilbur
  • Aparna Taneja
  • Corrine Namblard
  • Oyindamola Adeyemo
  • Abhishek Dubey
  • Abiodun Adereni

More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations' sustainable development goals (SDG 3) aims to end preventable deaths of newborns and children under five years of age. We focus on Nigeria, where the rate of infant mortality is appalling. We collaborate with HelpMum, a large non-profit organization in Nigeria, to design and optimize the allocation of heterogeneous health interventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Our optimization formulation is intractable in practice. We present a heuristic approach that enables us to solve the problem for real-world use-cases. We also present theoretical bounds for the heuristic method. Finally, we show that the proposed approach outperforms baseline methods in terms of vaccination uptake through experimental evaluation. HelpMum is currently planning a pilot program based on our approach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI-driven vaccination uptake program in the country and hopefully, pave the way for other data-driven programs to improve health outcomes in Nigeria.

AAAI Conference 2022 Conference Paper

Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games

  • Kai Wang
  • Lily Xu
  • Andrew Perrault
  • Michael K. Reiter
  • Milind Tambe

A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers’ best response as constraints in the leader’s optimization problem. These reformulation approaches can sometimes be effective, but make limiting assumptions on the followers’ objectives and the equilibrium reached by followers, e. g. , uniqueness, optimism, or pessimism. To overcome these limitations, we run gradient descent to update the leader’s strategy by differentiating through the equilibrium reached by followers. Our approach generalizes to any stochastic equilibrium selection procedure that chooses from multiple equilibria, where we compute the stochastic gradient by back-propagating through a sampled Nash equilibrium using the solution to a partial differential equation to establish the unbiasedness of the stochastic gradient. Using the unbiased gradient estimate, we implement the gradient-based approach to solve three Stackelberg problems with multiple followers. Our approach consistently outperforms existing baselines to achieve higher utility for the leader.

NeurIPS Conference 2022 Conference Paper

Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses

  • Sanket Shah
  • Kai Wang
  • Bryan Wilder
  • Andrew Perrault
  • Milind Tambe

Decision-Focused Learning (DFL) is a paradigm for tailoring a predictive model to a downstream optimization task that uses its predictions in order to perform better \textit{on that specific task}. The main technical challenge associated with DFL is that it requires being able to differentiate through the optimization problem, which is difficult due to discontinuous solutions and other challenges. Past work has largely gotten around this this issue by \textit{handcrafting} task-specific surrogates to the original optimization problem that provide informative gradients when differentiated through. However, the need to handcraft surrogates for each new task limits the usability of DFL. In addition, there are often no guarantees about the convexity of the resulting surrogates and, as a result, training a predictive model using them can lead to inferior local optima. In this paper, we do away with surrogates altogether and instead \textit{learn} loss functions that capture task-specific information. To the best of our knowledge, ours is the first approach that entirely replaces the optimization component of decision-focused learning with a loss that is automatically learned. Our approach (a) only requires access to a black-box oracle that can solve the optimization problem and is thus \textit{generalizable}, and (b) can be \textit{convex by construction} and so can be easily optimized over. We evaluate our approach on three resource allocation problems from the literature and find that our approach outperforms learning without taking into account task-structure in all three domains, and even hand-crafted surrogates from the literature.

AAMAS Conference 2022 Conference Paper

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

  • Aditya S. Mate
  • Arpita Biswas
  • Christoph Siebenbrunner
  • Susobhan Ghosh
  • Milind Tambe

We propose Streaming Bandits, a Restless Multi-Armed Bandit (RMAB) framework in which heterogeneous arms may arrive and leave the system after staying on for a finite lifetime. Streaming Bandits naturally capture the health-intervention planning problem, where health workers must manage the health outcomes of a patient cohort while new patients join and existing patients leave the cohort each day. Our contributions are as follows: (1) We derive conditions under which our problem satisfies indexability, a precondition that guarantees the existence and asymptotic optimality of the Whittle Index solution for RMABs. We establish the conditions using a polytime reduction of the Streaming Bandit setup to regular RMABs. (2) We further prove a phenomenon that we call index decay — whereby the Whittle index values are low for short residual lifetimes — driving the intuition underpinning our algorithm. (3) We propose a novel and efficient algorithm to compute the index-based solution for Streaming Bandits. Unlike previous methods, our algorithm does not rely on solving the costly finite horizon problem on each arm of the RMAB, thereby lowering the computational complexity compared to existing methods. (4) Finally, we evaluate our approach via simulations run on real-world data sets from a tuberculosis patient monitoring task and an intervention planning task for improving maternal healthcare, in addition to other synthetic domains. Across the board, our algorithm achieves a 2-orders-of-magnitude speed-up over existing methods while maintaining the same solution quality. The full paper is available at: https: //arxiv. org/pdf/2103. 04730. pdf

IJCAI Conference 2022 Conference Paper

Evolutionary Approach to Security Games with Signaling

  • Adam Żychowski
  • Jacek Mańdziuk
  • Elizabeth Bondi
  • Aravind Venugopal
  • Milind Tambe
  • Balaraman Ravindran

Green Security Games have become a popular way to model scenarios involving the protection of natural resources, such as wildlife. Sensors (e. g. drones equipped with cameras) have also begun to play a role in these scenarios by providing real-time information. Incorporating both human and sensor defender resources strategically is the subject of recent work on Security Games with Signaling (SGS). However, current methods to solve SGS do not scale well in terms of time or memory. We therefore propose a novel approach to SGS, which, for the first time in this domain, employs an Evolutionary Computation paradigm: EASGS. EASGS effectively searches the huge SGS solution space via suitable solution encoding in a chromosome and a specially-designed set of operators. The operators include three types of mutations, each focusing on a particular aspect of the SGS solution, optimized crossover and a local coverage improvement scheme (a memetic aspect of EASGS). We also introduce a new set of benchmark games, based on dense or locally-dense graphs that reflect real-world SGS settings. In the majority of 342 test game instances, EASGS outperforms state-of-the-art methods, including a reinforcement learning method, in terms of time scalability, nearly constant memory utilization, and quality of the returned defender's strategies (expected payoffs).

AAAI Conference 2022 Conference Paper

Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-profits in Improving Maternal and Child Health

  • Aditya Mate
  • Lovish Madaan
  • Aparna Taneja
  • Neha Madhiwalla
  • Shresth Verma
  • Gargi Singh
  • Aparna Hegde
  • Pradeep Varakantham

The widespread availability of cell phones has enabled nonprofits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ∼ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use.

AAMAS Conference 2022 Conference Paper

Networked Restless Multi-Armed Bandits for Mobile Interventions

  • Han-Ching Ou
  • Christoph Siebenbrunner
  • Jackson Killian
  • Meredith B. Brooks
  • David Kempe
  • Yevgeniy Vorobeychik
  • Milind Tambe

Motivated by a broad class of mobile intervention problems, we propose and study restless multi-armed bandits (RMABs) with network effects. In our model, arms are partially recharging and connected through a graph, so that pulling one arm also improves the state of neighboring arms, significantly extending the previously studied setting of fully recharging bandits with no network effects. In mobile interventions, network effects may arise due to regular population movements (such as commuting between home and work). We show that network effects in RMABs induce strong reward coupling that is not accounted for by existing solution methods. We propose a new solution approach for networked RMABs, exploiting concavity properties which arise under natural assumptions on the structure of intervention effects. We provide sufficient conditions for optimality of our approach in idealized settings and demonstrate that it empirically outperforms state-of-the art baselines in three mobile intervention domains using real-world graphs.

IJCAI Conference 2022 Conference Paper

Ranked Prioritization of Groups in Combinatorial Bandit Allocation

  • Lily Xu
  • Arpita Biswas
  • Fei Fang
  • Milind Tambe

Preventing poaching through ranger patrols protects endangered wildlife, directly contributing to the UN Sustainable Development Goal 15 of life on land. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective.

UAI Conference 2022 Conference Paper

Restless and uncertain: Robust policies for restless bandits via deep multi-agent reinforcement learning

  • Jackson A. Killian
  • Lily Xu
  • Arpita Biswas
  • Milind Tambe

We introduce robustness in \textit{restless multi-armed bandits} (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant uncertainty, e. g. , via historical data, which can lead to bad outcomes if ignored. To address this, we develop an algorithm to compute minimax regret–robust policies for RMABs. Our approach uses a double oracle framework (oracles for \textit{agent} and \textit{nature}), which is often used for single-process robust planning but requires significant new techniques to accommodate the combinatorial nature of RMABs. Specifically, we design a deep reinforcement learning (RL) algorithm, DDLPO, which tackles the combinatorial challenge by learning an auxiliary “$\lambda$-network” in tandem with policy networks per arm, greatly reducing sample complexity, with guarantees on convergence. DDLPO, of general interest, implements our reward-maximizing agent oracle. We then tackle the challenging regret-maximizing nature oracle, a non-stationary RL challenge, by formulating it as a multi-agent RL problem between a policy optimizer and adversarial nature. This formulation is of general interest—we solve it for RMABs by creating a multi-agent extension of DDLPO with a shared critic. We show our approaches work well in three experimental domains.

UAI Conference 2022 Conference Paper

Solving structured hierarchical games using differential backward induction

  • Zun Li 0002
  • Feiran Jia
  • Aditya Mate
  • Shahin Jabbari
  • Mithun Chakraborty
  • Milind Tambe
  • Yevgeniy Vorobeychik

From large-scale organizations to decentralized political systems, hierarchical strategic decision making is commonplace. We introduce a novel class of structured hierarchical games (SHGs) that formally capture such hierarchical strategic interactions. In an SHG, each player is a node in a tree, and strategic choices of players are sequenced from root to leaves, with root moving first, followed by its children, then followed by their children, and so on until the leaves. A player’s utility in an SHG depends on its own decision, and on the choices of its parent and all the tree leaves. SHGs thus generalize simultaneous-move games, as well as Stackelberg games with many followers. We leverage the structure of both the sequence of player moves as well as payoff dependence to develop a gradient-based back propagation-style algorithm, which we call Differential Backward Induction (DBI), for approximating equilibria of SHGs. We provide a sufficient condition for convergence of DBI and demonstrate its efficacy in finding approximate equilibrium solutions to several SHG models of hierarchical policy-making problems.

AAMAS Conference 2021 Conference Paper

Active Screening for Recurrent Diseases: A Reinforcement Learning Approach

  • Han-Ching Ou
  • Haipeng Chen
  • Shahin Jabbari
  • Milind Tambe

Active screening is a common approach in controlling the spread of recurring infectious diseases such as tuberculosis and influenza. In this approach, health workers periodically select a subset of population for screening. However, given the limited number of health workers, only a small subset of the population can be visited in any given time period. Given the recurrent nature of the disease and rapid spreading, the goal is to minimize the number of infections over a long time horizon. Active screening can be formalized as a sequential combinatorial optimization over the network of people and their connections. The main computational challenges in this formalization arise from i) the combinatorial nature of the problem, ii) the need of sequential planning and iii) the uncertainties in the infectiousness states of the population. Previous works on active screening fail to scale to large time horizon while fully considering the future effect of current interventions. In this paper, we propose a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN), with several innovative adaptations that are designed to address the above challenges. First, we use graph convolutional networks (GCNs) to represent the Q-function that exploit the node correlations of the underlying contact network. Second, to avoid solving a combinatorial optimization problem in each time period, we decompose the node set selection as a sub-sequence of decisions, and further design a two-level RL framework that solves the problem in a hierarchical way. Finally, to speed-up the slow convergence of RL which arises from reward sparseness, we incorporate ideas from curriculum learning into our hierarchical RL approach. We evaluate our RL algorithm on several real-world networks. Results show that our RL algorithm can scale up to 10 times the problem size of state-of-the-art (the variant that considers the effect of future interventions but un-scalable) in terms of planning time horizon. Meanwhile, it outperforms state-of-theart (the variant that scales up but does not consider the effect of future interventions) by up to 33% in solution quality.

AAMAS Conference 2021 Conference Paper

Beyond "To Act or Not to Act": Fast Lagrangian Approaches to General Multi-Action Restless Bandits

  • Jackson A. Killian
  • Andrew Perrault
  • Milind Tambe

This paper presents new algorithms and theoretical results for solutions to Multi-action Multi-armed Restless Bandits, an important but insufficiently studied generalization of traditional Multi-armed Restless Bandits (MARBs). Though MARBs are popular for modeling many problems, they are restricted to binary actions, i. e. , "to act or not to act". This renders them unable to capture critical complexities faced by planners in real domains, such as a system manager balancing maintenance, repair, and job scheduling, or a health worker deciding among treatments for a given patient. Limited previous work on Multi-action MARBs has only been specialized to subproblems. Here we derive multiple algorithms for use on general Multi-action MARBs using Lagrangian relaxation techniques, leading to the following contributions: (i) We develop BLam, a bound optimization algorithm which leverages problem convexity to quickly and provably converge to the well-performing Lagrange policy; (ii) We develop SampleLam, a fast sampling technique for estimating the Lagrange policy, and derive a concentration bound to investigate its convergence properties; (iii) We derive best and worst case computational complexities for our algorithms as well as our main competitor; (iv) We provide experimental results comparing our algorithms to baselines on simulated distributions, including one motivated by a real-world community health intervention task. Our approach achieves significant, up to ten-fold speedups over more general methods without sacrificing performance and is widely applicable across general Multi-action MARBs. Code is available at https: //github. com/killian-34/MAMARB-Lagrange-Policies.

AAAI Conference 2021 Conference Paper

Clinical Trial of an AI-Augmented Intervention for HIV Prevention in Youth Experiencing Homelessness

  • Bryan Wilder
  • Laura Onasch-Vera
  • Graham Diguiseppi
  • Robin Petering
  • Chyna Hill
  • Amulya Yadav
  • Eric Rice
  • Milind Tambe

Youth experiencing homelessness (YEH) are subject to substantially greater risk of HIV infection, compounded both by their lack of access to stable housing and the disproportionate representation of youth of marginalized racial, ethnic, and gender identity groups among YEH. A key goal for health equity is to improve adoption of protective behaviors in this population. One promising strategy for intervention is to recruit peer leaders from the population of YEH to promote behaviors such as condom usage and regular HIV testing to their social contacts. This raises a computational question: which youth should be selected as peer leaders to maximize the overall impact of the intervention? We developed an artificial intelligence system to optimize such social network interventions in a community health setting. We conducted a clinical trial enrolling 713 YEH at drop-in centers in a large US city. The clinical trial compared interventions planned with the algorithm to those where the highest-degree nodes in the youths’ social network were recruited as peer leaders (the standard method in public health) and to an observation-only control group. Results from the clinical trial show that youth in the AI group experience statistically significant reductions in key risk behaviors for HIV transmission, while those in the other groups do not. This provides, to our knowledge, the first empirical validation of the usage of AI methods to optimize social network interventions for health. We conclude by discussing lessons learned over the course of the project which may inform future attempts to use AI in community-level interventions.

UAI Conference 2021 Conference Paper

Contingency-aware influence maximization: A reinforcement learning approach

  • Haipeng Chen 0001
  • Wei Qiu 0001
  • Han-Ching Ou
  • Bo An 0001
  • Milind Tambe

The influence maximization (IM) problem aims at finding a subset of seed nodes in a social network that maximize the spread of influence. In this study, we focus on a sub-class of IM problems, where whether the nodes are willing to be the seeds when being invited is uncertain, called contingency-aware IM. Such contingency aware IM is critical for applications for non-profit organizations in low resource communities (e. g. , spreading awareness of disease prevention). Despite the initial success, a major practical obstacle in promoting the solutions to more communities is the tremendous runtime of the greedy algorithms and the lack of high performance computing (HPC) for the non-profits in the field – whenever there is a new social network, the non-profits usually do not have the HPCs to recalculate the solutions. Motivated by this and inspired by the line of works that use reinforcement learning (RL) to address combinatorial optimization on graphs, we formalize the problem as a Markov Decision Process (MDP), and use RL to learn an IM policy over historically seen networks, and generalize to unseen networks with negligible runtime at test phase. To fully exploit the properties of our targeted problem, we propose two technical innovations that improve the existing methods, including state-abstraction and theoretically grounded reward shaping. Empirical results show that our method achieves influence as high as the state-of-the-art methods for contingency-aware IM, while having negligible runtime at test phase.

AAAI Conference 2021 Conference Paper

Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

  • Lily Xu
  • Elizabeth Bondi
  • Fei Fang
  • Andrew Perrault
  • Kai Wang
  • Milind Tambe

Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i. e. , patrollers), who must patrol vast areas to protect from attackers (e. g. , poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitzcontinuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.

AAAI Conference 2021 Conference Paper

Fair Influence Maximization: a Welfare Optimization Approach

  • Aida Rahmattalabi
  • Shahin Jabbari
  • Himabindu Lakkaraju
  • Phebe Vayanos
  • Max Izenberg
  • Ryan Brown
  • Eric Rice
  • Milind Tambe

Several behavioral, social, and public health interventions, such as suicide/HIV prevention or community preparedness against natural disasters, leverage social network information to maximize outreach. Algorithmic influence maximization techniques have been proposed to aid with the choice of “peer leaders” or “influencers” in such interventions. Yet, traditional algorithms for influence maximization have not been designed with these interventions in mind. As a result, they may disproportionately exclude minority communities from the benefits of the intervention. This has motivated research on fair influence maximization. Existing techniques come with two major drawbacks. First, they require committing to a single fairness measure. Second, these measures are typically imposed as strict constraints leading to undesirable properties such as wastage of resources. To address these shortcomings, we provide a principled characterization of the properties that a fair influence maximization algorithm should satisfy. In particular, we propose a framework based on social welfare theory, wherein the cardinal utilities derived by each community are aggregated using the isoelastic social welfare functions. Under this framework, the trade-off between fairness and efficiency can be controlled by a single inequality aversion design parameter. We then show under what circumstances our proposed principles can be satisfied by a welfare function. The resulting optimization problem is monotone and submodular and can be solved efficiently with optimality guarantees. Our framework encompasses as special cases leximin and proportional fairness. Extensive experiments on synthetic and real world datasets including a case study on landslide risk management demonstrate the efficacy of the proposed framework12

IJCAI Conference 2021 Conference Paper

Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare

  • Arpita Biswas
  • Gaurav Aggarwal
  • Pradeep Varakantham
  • Milind Tambe

In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover, since the transition probabilities are unknown a priori, we propose a Whittle index based Q-Learning mechanism and show that it converges to the optimal solution. Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset.

AAMAS Conference 2021 Conference Paper

Learning Index Policies for Restless Bandits with Application to Maternal Healthcare

  • Arpita Biswas
  • Gaurav Aggarwal
  • Pradeep Varakantham
  • Milind Tambe

In many community health settings, it is crucial to have a systematic monitoring and intervention process to ensure that the patients adhere to healthcare programs, such as periodic health checks or taking medications. When these interventions are expensive, they can be provided to only a fixed small fraction of the patients at any period of time. Hence, it is important to carefully choose the beneficiaries who should be provided with interventions and when. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention provided to them. In practice, the transition probabilities are unknown a priori, and hence, we propose a mechanism for the problem of balancing the explore-exploit trade-off. Empirically, we find that our proposed mechanism outperforms the baseline intervention scheme maternal healthcare dataset.

NeurIPS Conference 2021 Conference Paper

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning

  • Kai Wang
  • Sanket Shah
  • Haipeng Chen
  • Andrew Perrault
  • Finale Doshi-Velez
  • Milind Tambe

In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman-based and policy gradient-based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.

AAMAS Conference 2021 Conference Paper

Reinforcement Learning for Unified Allocation and Patrolling in Signaling Games with Uncertainty

  • Aravind Venugopal
  • Elizabeth Bondi
  • Harshavardhan Kamarthi
  • Keval Dholakia
  • Balaraman Ravindran
  • Milind Tambe

Green Security Games (GSGs) have been successfully used in the protection of valuable resources such as fisheries, forests, and wildlife. Real-world deployment involves both resource allocation and subsequent coordinated patrolling with communication in the presence real-time, uncertain information. Previous game models do not address both of these stages simultaneously. Furthermore, adopting existing solution strategies is difficult since they do not scale well for larger, more complex variants of the game models. We propose a novel GSG model to address these challenges. We also present a novel algorithm, CombSGPO, to compute a defender strategy for this game model. CombSGPO performs policy search over a multidimensional, discrete action space to compute an allocation strategy that is best suited to a best-response patrolling strategy for the defender, learnt by training a multi-agent Deep Q- Network. We show via experiments that CombSGPO converges to better strategies and is more scalable than comparable approaches. From a detailed analysis of the coordination and signaling behavior learnt by CombSGPO, we find that strategic signaling emerges in the final learnt strategy.

AAMAS Conference 2021 Conference Paper

Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits

  • Aditya Mate
  • Andrew Perrault
  • Milind Tambe

Community Health Workers (CHWs) form an important component of health-care systems globally, especially in low-resource settings. CHWs are often tasked with monitoring the health of and intervening on their patient cohort. Previous work has developed several classes of Restless Multi-Armed Bandits (RMABs) that are computationally tractable and indexable, a condition that guarantees asymptotic optimality, for solving such health monitoring and intervention problems (HMIPs). However, existing solutions to HMIPs fail to account for risk-sensitivity considerations of CHWs in the planning stage and may run the danger of ignoring some patients completely because they are deemed less valuable to intervene on. Additionally, these also rely on patients reporting their state of adherence accurately when intervened upon. Towards tackling these issues, our contributions in this paper are as follows: (1) We develop an RMAB solution to HMIPs that allows for reward functions that are monotone increasing, rather than linear, in the belief state and also supports a wider class of observations. (2) We prove theoretical guarantees on the asymptotic optimality of our algorithm for any arbitrary reward function. Additionally, we show that for the specific reward function considered in previous work, our theoretical conditions are stronger than the state-of-the-art guarantees. (3) We show the applicability of these new results for addressing the three issues pertaining to: risk-sensitive planning, equitable allocation and reliance on perfect observations as highlighted above. We evaluate these techniques on both simulated as well as real data from a prevalent CHW task of monitoring adherence of tuberculosis patients to their prescribed medication in Mumbai, India and show improved performance over the state-of-the-art. Full paper and code is available at: https: //github. com/AdityaMate/risk-aware-bandits.

UAI Conference 2021 Conference Paper

Robust reinforcement learning under minimax regret for green security

  • Lily Xu
  • Andrew Perrault
  • Fei Fang 0001
  • Haipeng Chen 0001
  • Milind Tambe

Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries’ future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning–based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data.

AAAI Conference 2021 Conference Paper

Tracking Disease Outbreaks from Sparse Data with Bayesian Inference

  • Bryan Wilder
  • Michael Mina
  • Milind Tambe

The COVID-19 pandemic provides new motivation for a classic problem in epidemiology: estimating the empirical rate of transmission during an outbreak (formally, the timevarying reproduction number) from case counts. While standard methods exist, they work best at coarse-grained national or state scales with abundant data, and struggle to accommodate the partial observability and sparse data common at finer scales (e. g. , individual schools or towns). For example, case counts may be sparse when only a small fraction of infections are caught by a testing program. Or, whether an infected individual tests positive may depend on the kind of test and the point in time when they are tested. We propose a Bayesian framework which accommodates partial observability in a principled manner. Our model places a Gaussian process prior over the unknown reproduction number at each time step and models observations sampled from the distribution of a specific testing program. For example, our framework can accommodate a variety of kinds of tests (viral RNA, antibody, antigen, etc.) and sampling schemes (e. g. , longitudinal or cross-sectional screening). Inference in this framework is complicated by the presence of tens or hundreds of thousands of discrete latent variables. To address this challenge, we propose an efficient stochastic variational inference method which relies on a novel gradient estimator for the variational objective. Experimental results for an example motivated by COVID-19 show that our method produces an accurate and well-calibrated posterior, while standard methods for estimating the reproduction number can fail badly.

NeurIPS Conference 2020 Conference Paper

Automatically Learning Compact Quality-aware Surrogates for Optimization Problems

  • Kai Wang
  • Bryan Wilder
  • Andrew Perrault
  • Milind Tambe

Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in the model training pipeline results in predictions of the unobserved parameters that lead to higher decision quality. Unfortunately, this process comes at a large computational cost because the optimization problem must be solved and differentiated through in each training iteration; furthermore, it may also sometimes fail to improve solution quality due to non-smoothness issues that arise when training through a complex optimization layer. To address these shortcomings, we learn a low-dimensional surrogate model of a large optimization problem by representing the feasible space in terms of meta-variables, each of which is a linear combination of the original variables. By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, we achieve: i) a large reduction in training and inference time; and ii) improved performance by focusing attention on the more important variables in the optimization and learning in a smoother space. Empirically, we demonstrate these improvements on a non-convex adversary modeling task, a submodular recommendation task and a convex portfolio optimization task.

NeurIPS Conference 2020 Conference Paper

Collapsing Bandits and Their Application to Public Health Intervention

  • Aditya Mate
  • Jackson Killian
  • Haifeng Xu
  • Andrew Perrault
  • Milind Tambe

We propose and study Collapsing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus“collapsing” any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the “good” state as possible by planning a limited budget of actions per round. Such CollapsingBandits are natural models for many healthcare domains in which health workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either“forward” or “reverse” threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients’ adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques, while achieving similar performance. The code is available at: https: //github. com/AdityaMate/collapsing_bandits

AAAI Conference 2020 Conference Paper

End-to-End Game-Focused Learning of Adversary Behavior in Security Games

  • Andrew Perrault
  • Bryan Wilder
  • Eric Ewing
  • Aditya Mate
  • Bistra Dilkina
  • Milind Tambe

Stackelberg security games are a critical tool for maximizing the utility of limited defense resources to protect important targets from an intelligent adversary. Motivated by green security, where the defender may only observe an adversary’s response to defense on a limited set of targets, we study the problem of learning a defense that generalizes well to a new set of targets with novel feature values and combinations. Traditionally, this problem has been addressed via a two-stage approach where an adversary model is trained to maximize predictive accuracy without considering the defender’s optimization problem. We develop an end-to-end game-focused approach, where the adversary model is trained to maximize a surrogate for the defender’s expected utility. We show both in theory and experimental results that our game-focused approach achieves higher defender expected utility than the two-stage alternative when there is limited data.

UAI Conference 2020 Conference Paper

Robust Spatial-Temporal Incident Prediction

  • Ayan Mukhopadhyay
  • Kai Wang 0040
  • Andrew Perrault
  • Mykel J. Kochenderfer
  • Milind Tambe
  • Yevgeniy Vorobeychik

Spatio-temporal incident prediction is a central issue in law enforcement, with applications in fighting crimes like poaching, human trafficking, illegal fishing, burglaries and smuggling. However, state of the art approaches fail to account for evasion in response to predictive models, a common form of which is spatial shift in incident occurrence. We present a general approach for incident forecasting that is robust to spatial shifts. We propose two techniques for solving the resulting robust optimization problem: first, a constraint generation method guaranteed to yield an optimal solution, and second, a more scalable gradient-based approach. We then apply these techniques to both discrete-time and continuous-time robust incident forecasting. We evaluate our algorithms on two different real-world datasets, demonstrating that our approach is significantly more robust than conventional methods.

AAAI Conference 2020 Conference Paper

Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

  • Shah Sanket
  • Arunesh Sinha
  • Pradeep Varakantham
  • Perrault Andrew
  • Milind Tambe

Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time-window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed timewindows. To address this, we propose an online threat screening model in which the screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem, we first reformulate it as a Markov Decision Process (MDP) in which the hard bound on risk translates to a constraint on the action space and then solve the resultant MDP using Deep Reinforcement Learning (DRL). To this end, we provide a novel way to efficiently enforce linear inequality constraints on the action output in DRL. We show that our solution allows us to significantly reduce screenee wait time without compromising on the risk.

AAAI Conference 2020 Conference Paper

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and Sustainability

  • Elizabeth Bondi
  • Hoon Oh
  • Haifeng Xu
  • Fei Fang
  • Bistra Dilkina
  • Milind Tambe

Motivated by real-world deployment of drones for conservation, this paper advances the state-of-the-art in security games with signaling. The well-known defender-attacker security games framework can help in planning for such strategic deployments of sensors and human patrollers, and warning signals to ward off adversaries. However, we show that defenders can suffer significant losses when ignoring real-world uncertainties despite carefully planned security game strategies with signaling. In fact, defenders may perform worse than forgoing drones completely in this case. We address this shortcoming by proposing a novel game model that integrates signaling and sensor uncertainty; perhaps surprisingly, we show that defenders can still perform well via a signaling strategy that exploits uncertain real-time information. For example, even in the presence of uncertainty, the defender still has an informational advantage in knowing that she has or has not actually detected the attacker; and she can design a signaling scheme to “mislead” the attacker who is uncertain as to whether he has been detected. We provide theoretical results, a novel algorithm, scale-up techniques, and experimental results from simulation based on our ongoing deployment of a conservation drone system in South Africa.

AAMAS Conference 2019 Conference Paper

Broken Signals in Security Games: Coordinating Patrollers and Sensors in the Real World

  • Elizabeth Bondi
  • Hoon Oh
  • Haifeng Xu
  • Fei Fang
  • Bistra Dilkina
  • Milind Tambe

Mobile sensors, e. g. , unmanned aerial vehicles (UAVs), are becoming increasingly important in security domains and can be used for tasks such as searching for poachers in conservation areas. Such mobile sensors augment human patrollers by assisting in surveillance and in signaling potentially deceptive information to adversaries, and their coordinated deployment could be modeled via the well-known security games framework. Unfortunately, real-world uncertainty in the sensor’s detection of adversaries and adversaries’ observation of the sensor’s signals present major challenges in the sensors’ use. This leads to significant detriments in security performance. We first discuss the current shortcomings in more detail, and then propose a novel game model that incorporates uncertainty with sensors. The defender strategy in this game model will consist of three interdependent stages: an allocation stage, a signaling stage, and a reaction stage.

AAMAS Conference 2019 Conference Paper

Deep Fictitious Play for Games with Continuous Action Spaces

  • Nitin Kamra
  • Umang Gupta
  • Kai Wang
  • Fei Fang
  • Yan Liu
  • Milind Tambe

Fictitious play has been a classic algorithm to solve two-player adversarial games with discrete action spaces. In this work we develop an approximate extension of fictitious play to two-player games with high-dimensional continuous action spaces. We use generative neural networks to approximate players’ best responses while also learning a differentiable approximate model to the players’ rewards given their actions. Both these networks are trained jointly with gradient-based optimization to emulate fictitious play. We explore our approach in zero-sum games, non zero-sum games and security game domains.

AAMAS Conference 2019 Conference Paper

Don't Put All Your Strategies in One Basket: Playing Green Security Games with Imperfect Prior Knowledge

  • Shahrzad Gholami
  • Amulya Yadav
  • Long Tran-Thanh
  • Bistra Dilkina
  • Milind Tambe

Security efforts for wildlife monitoring and protection of endangered species (e. g. , elephants, rhinos, etc.) are constrained by limited resources available to law enforcement agencies. Recent progress in Green Security Games (GSGs) has led to patrol planning algorithms for strategic allocation of limited patrollers to deter adversaries in environmental settings. Unfortunately, previous approaches to these problems suffer from several limitations. Most notably, (i) previous work in GSG literature relies on exploitation of error-prone machine learning (ML) models of poachers’ behavior trained on (spatially) biased historical data; and (ii) online learning approaches for repeated security games (similar to GSGs) do not account for spatio-temporal scheduling constraints while planning patrols, potentially causing significant shortcomings in the effectiveness of the planned patrols. Thus, this paper makes the following novel contributions: (I) We propose MINION-sm, a novel online learning algorithm for GSGs which does not rely on any prior error-prone model of attacker behavior, instead, it builds an implicit model of the attacker on-the-fly while simultaneously generating schedulingconstraint-aware patrols. MINION-sm achieves a sublinear regret against an optimal hindsight patrol strategy. (II) We also propose MINION, a hybrid approach where our MINION-sm model and an ML model (based on historical data) are considered as two patrol planning experts and we obtain a balance between them based on their observed empirical performance. (III) We show that our online learning algorithms significantly outperform existing state-of-theart solvers for GSGs.

NeurIPS Conference 2019 Conference Paper

End to end learning and optimization on graphs

  • Bryan Wilder
  • Eric Ewing
  • Bistra Dilkina
  • Milind Tambe

Real-world applications often combine learning and optimization problems on graphs. For instance, our objective may be to cluster the graph in order to detect meaningful communities (or solve other common graph optimization problems such as facility location, maxcut, and so on). However, graphs or related attributes are often only partially observed, introducing learning problems such as link prediction which must be solved prior to optimization. Standard approaches treat learning and optimization entirely separately, while recent machine learning work aims to predict the optimal solution directly from the inputs. Here, we propose an alternative decision-focused learning approach that integrates a differentiable proxy for common graph optimization problems as a layer in learned systems. The main idea is to learn a representation that maps the original optimization problem onto a simpler proxy problem that can be efficiently differentiated through. Experimental results show that our ClusterNet system outperforms both pure end-to-end approaches (that directly predict the optimal solution) and standard approaches that entirely separate learning and optimization. Code for our system is available at https: //github. com/bwilder0/clusternet.

NeurIPS Conference 2019 Conference Paper

Exploring Algorithmic Fairness in Robust Graph Covering Problems

  • Aida Rahmattalabi
  • Phebe Vayanos
  • Anthony Fulginiti
  • Eric Rice
  • Bryan Wilder
  • Amulya Yadav
  • Milind Tambe

Fueled by algorithmic advances, AI algorithms are increasingly being deployed in settings subject to unanticipated challenges with complex social effects. Motivated by real-world deployment of AI driven, social-network based suicide prevention and landslide risk management interventions, this paper focuses on a robust graph covering problem subject to group fairness constraints. We show that, in the absence of fairness constraints, state-of-the-art algorithms for the robust graph covering problem result in biased node coverage: they tend to discriminate individuals (nodes) based on membership in traditionally marginalized groups. To remediate this issue, we propose a novel formulation of the robust covering problem with fairness constraints and a tractable approximation scheme applicable to real world instances. We provide a formal analysis of the price of group fairness (PoF) for this problem, where we show that uncertainty can lead to greater PoF. We demonstrate the effectiveness of our approach on several real-world social networks. Our method yields competitive node coverage while significantly improving group fairness relative to state-of-the-art methods.

IJCAI Conference 2019 Conference Paper

Group-Fairness in Influence Maximization

  • Alan Tsang
  • Bryan Wilder
  • Eric Rice
  • Milind Tambe
  • Yair Zick

Influence maximization is a widely used model for information dissemination in social networks. Recent work has employed such interventions across a wide range of social problems, spanning public health, substance abuse, and international development (to name a few examples). A critical but understudied question is whether the benefits of such interventions are fairly distributed across different groups in the population; e. g. , avoiding discrimination with respect to sensitive attributes such as race or gender. Drawing on legal and game-theoretic concepts, we introduce formal definitions of fairness in influence maximization. We provide an algorithmic framework to find solutions which satisfy fairness constraints, and in the process improve the state of the art for general multi-objective submodular maximization problems. Experimental results on real data from an HIV prevention intervention for homeless youth show that standard influence maximization techniques oftentimes neglect smaller groups which contribute less to overall utility, resulting in a disparity which our proposed algorithms substantially reduce.

AAAI Conference 2019 Conference Paper

Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization

  • Bryan Wilder
  • Bistra Dilkina
  • Milind Tambe

Creating impact in real-world settings requires artificial intelligence techniques to span the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal, which is to make the best decisions possible. Hand-tuning the loss function to align with optimization is a difficult and error-prone process (which is often skipped entirely). We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce highquality decisions. Technically, our contribution is a means of integrating common classes of discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. Experimental results across a variety of domains show that decisionfocused learning often leads to improved optimization performance compared to traditional methods. We find that standard measures of accuracy are not a reliable proxy for a predictive model’s utility in optimization, and our method’s ability to specify the true goal as the model’s training objective yields substantial dividends across a range of decision problems.

AAAI Conference 2019 Conference Paper

On the Inducibility of Stackelberg Equilibrium for Security Games

  • Qingyu Guo
  • Jiarui Gan
  • Fei Fang
  • Long Tran-Thanh
  • Milind Tambe
  • Bo An

Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. As opposed to the weak Stackelberg equilibrium (WSE), the SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid; it is possible that the defender cannot induce the desired outcome. As a result, many results claimed in the literature may be overly optimistic. To remedy, we first formally define the utility guarantee of a defender strategy and provide examples to show that the utility of SSE can be higher than its utility guarantee. Second, inspired by the analysis of leader’s payoff by Von Stengel and Zamir (2004), we provide the solution concept called the inducible Stackelberg equilibrium (ISE), which owns the highest utility guarantee and always exists. Third, we show the conditions when ISE coincides with SSE and the fact that in general case, SSE can be extremely worse with respect to utility guarantee. Moreover, introducing the ISE does not invalidate existing algorithmic results as the problem of computing an ISE polynomially reduces to that of computing an SSE. We also provide an algorithmic implementation for computing ISE, with which our experiments unveil the empirical advantage of the ISE over the SSE.

AAMAS Conference 2019 Conference Paper

Robust Peer-Monitoring on Graphs with an Application to Suicide Prevention in Social Networks

  • Aida Rahmattalabi
  • Phebe Vayanos
  • Anthony Fulginiti
  • Milind Tambe

We consider the problem of selecting a subset of nodes (individuals) in a (social) network that can act as monitors capable of “watchingout” for their neighbors (friends) when the availability or performance of the chosen monitors is uncertain. Such problems arise for example in the context of “Gatekeeper Trainings” for suicide prevention. We formulate this problem as a two-stage robust optimization problem that aims to maximize the worst-case number of covered nodes. Our model is capable of incorporating domain specific constraints, e. g. , fairness constraints. We propose a practically tractable approximation scheme and we provide empirical results that demonstrate the effectiveness of our approach.

AAMAS Conference 2019 Conference Paper

Using Game Theory in Real Time in the Real World: A Conservation Case Study

  • Elizabeth Bondi
  • Hoon Oh
  • Haifeng Xu
  • Fei Fang
  • Bistra Dilkina
  • Milind Tambe

In the real world, real-time data are now widely available, especially in security domains. Security cameras, aerial imagery, and even social media keep defenders informed when protecting important events, locations, and people. Further, advances in artificial intelligence have led to tools that can interpret these data automatically. Game theoretic models, for example, have shown great success in security. However, most of them ignore real-time information. In this paper, we demonstrate the potential to use real-time information from imagery to better inform our decisions in game theoretic models for security. As a concrete example, a conservation group called Air Shepherd uses conservation drones equipped with thermal infrared cameras to locate poachers at night and alert park rangers. They have also used lights aboard the drones, or signaled, to warn poachers of their presence, which often deters the poachers. We propose a system that (i) allocates drones and humans strategically throughout a protected area, (ii) detects poachers in the thermal infrared videos recorded by the conservation drones flying through the protected area in the predetermined location, and (iii) recommends moving to the location and/or signaling to the poacher that a patroller is nearby depending on real-time detections. View the demonstration: http: //bit. ly/aamas19-demo-bondi-et-al.

AAMAS Conference 2019 Conference Paper

Warning Time: Optimizing Strategic Signaling for Security Against Boundedly Rational Adversaries

  • Sarah Cooney
  • Phebe Vayanos
  • Thanh H. Nguyen
  • Cleotilde Gonzalez
  • Christian Lebiere
  • Edward A. Cranford
  • Milind Tambe

Defender-attacker Stackelberg security games (SSGs) have been applied for solving many real-world security problems. Recent work in SSGs has incorporated a deceptive signaling scheme into the SSG model, where the defender strategically reveals information about her defensive strategy to the attacker, in order to influence the attacker’s decision making for the defender’s own benefit. In this work, we study the problem of signaling in security games against a boundedly rational attacker.

AAMAS Conference 2018 Conference Paper

Activating the "Breakfast Club": Modeling Influence Spread in Natural-World Social Networks

  • Lily Hu
  • Bryan Wilder
  • Amulya Yadav
  • Eric Rice
  • Milind Tambe

While reigning models of diffusion have privileged the structure of a given social network as the key to informational exchange, real human interactions do not appear to take place on a single graph of connections. Using data collected from a pilot study of the spread of HIV awareness in social networks of homeless youth, we show that health information did not diffuse in the field according to the processes outlined by dominant models. Since physical network diffusion scenarios often diverge from their more well-studied counterparts on digital networks, we propose an alternative Activation Jump Model (AJM) that describes information diffusion on physical networks from a multi-agent team perspective. Our model exhibits two main differentiating features from leading cascade and threshold models of influence spread: 1) The structural composition of a seed set team impacts each individual node’s influencing behavior, and 2) an influencing node may spread information to non-neighbors. We show that the AJM significantly outperforms existing models in its fit to the observed node-level influence data on the youth networks. We then prove theoretical results, showing that the AJM exhibits many well-behaved properties shared by dominant models. Our results suggest that the AJM presents a flexible and more accurate model of network diffusion that may better inform influence maximization in the field.

AAMAS Conference 2018 Conference Paper

Adversary Models Account for Imperfect Crime Data: Forecasting and Planning against Real-world Poachers

  • Shahrzad Gholami
  • Sara Mc Carthy
  • Bistra Dilkina
  • Andrew Plumptre
  • Milind Tambe
  • Margaret Driciru
  • Fred Wanyama
  • Aggrey Rwetsiba

Poachers are engaged in extinction level wholesale slaughter, so it is critical to harness historical data for predicting poachers’ behavior. However, in these domains, data collected about adversarial actions are remarkably imperfect, where reported negative instances of crime may be mislabeled or uncertain. Unfortunately, past attempts to develop predictive and prescriptive models to address this problem suffer from shortcomings from a modeling perspective as well as in the implementability of their techniques. Most notably these models i) neglect the uncertainty in crime data, leading to inaccurate and biased predictions of adversary behavior, ii) use coarse-grained crime analysis and iii) do not provide a convincing evaluation as they only look at a single protected area. Additionally, they iv) proposed time-consuming techniques which cannot be directly integrated into low resource outposts. In this innovative application paper, we (I) introduce iWare-E a novel imperfect-observation aWare Ensemble (iWare-E) technique1, which is designed to handle the uncertainty in crime information efficiently. This approach leads to superior accuracy for adversary behavior prediction (up to 34% increase in AUC) compared to the previous state-of-the-art. We also demonstrate the country-wide efficiency of the models and are the first to (II) evaluate our adversary behavioral model across different protected areas in Uganda, i. e. , Murchison Fall and Queen Elizabeth National Park, (totaling about 7500 square km) as well as (III) on fine-grained temporal resolutions. Lastly, (IV) we provide a scalable planning algorithm to design fine-grained patrol routes for the rangers, which achieves up to 150% improvement in number of predicted attacks detected.

IJCAI Conference 2018 Conference Paper

Bridging the Gap Between Theory and Practice in Influence Maximization: Raising Awareness about HIV among Homeless Youth

  • Amulya Yadav
  • Bryan Wilder
  • Eric Rice
  • Robin Petering
  • Jaih Craddock
  • Amanda Yoshioka-Maxwell
  • Mary Hemler
  • Laura Onasch-Vera

This paper reports on results obtained by deploying HEALER and DOSIM (two AI agents for social influence maximization) in the real-world, which assist service providers in maximizing HIV awareness in real-world homeless-youth social networks. These agents recommend key "seed" nodes in social networks, i. e. , homeless youth who would maximize HIV awareness in their real-world social network. While prior research on these agents published promising simulation results from the lab, the usability of these AI agents in the real-world was unknown. This paper presents results from three real-world pilot studies involving 173 homeless youth across two different homeless shelters in Los Angeles. The results from these pilot studies illustrate that HEALER and DOSIM outperform the current modus operandi of service providers by ~160% in terms of information spread about HIV among homeless youth.

AAMAS Conference 2018 Conference Paper

Deceiving Cyber Adversaries: A Game Theoretic Approach

  • Aaron Schlenker
  • Omkar Thakoor
  • Haifeng Xu
  • Fei Fang
  • Milind Tambe
  • Long Tran-Thanh
  • Phebe Vayanos
  • Yevgeniy Vorobeychik

An important way cyber adversaries find vulnerabilities in modern networks is through reconnaissance, in which they attempt to identify configuration specifics of network hosts. To increase uncertainty of adversarial reconnaissance, the network administrator (henceforth, defender) can introduce deception into responses to network scans, such as obscuring certain system characteristics. We introduce a novel game-theoretic model of deceptive interactions of this kind between a defender and a cyber attacker, which we call the Cyber Deception Game. We consider both a powerful (rational) attacker, who is aware of the defender’s exact deception strategy, and a naive attacker who is not. We show that computing the optimal deception strategy is NP-hard for both types of attackers. For the case with a powerful attacker, we provide a mixed-integer linear program solution as well as a fast and effective greedy algorithm. Similarly, we provide complexity results and propose exact and heuristic approaches when the attacker is naive. Our extensive experimental analysis demonstrates the effectiveness of our approaches.

AAMAS Conference 2018 Conference Paper

End-to-End Influence Maximization in the Field

  • Bryan Wilder
  • Laura Onasch-Vera
  • Juliana Hudson
  • Jose Luna
  • Nicole Wilson
  • Robin Petering
  • Darlene Woo
  • Milind Tambe

This work is aims to overcome the challenges in deploying influence maximization to support community driven interventions. Influence maximization is a crucial technique used in preventative health interventions, such as HIV prevention amongst homeless youth. Drop-in centers for homeless youth train a subset of youth as peer leaders who will disseminate information about HIV through their social networks. The challenge is to find a small set of peer leaders who will have the greatest possible influence. While many algorithms have been proposed for influence maximization, none can be feasibly deployed by a service provider: existing algorithms require costly surveys of the entire social network of the youth to provide input data, and high performance computing resources to run the algorithm itself. Both are crucial bottlenecks to widespread use of influence maximization in real world interventions. To address the above challenges, this paper introduces the CHANGE agent for influence maximization. CHANGE handles the end-toend process of influence maximization, from data collection to peer leader selection. Crucially, CHANGE only surveys a fraction of the youth to gather network data and minimizes computational cost while providing comparable performance to previously proposed algorithms. We carried out a pilot study of CHANGE in collaboration with a drop-in center serving homeless youth in a major U. S. city. CHANGE surveyed only 18% of the youth to construct its social network. However, the peer leaders it selected reached just as many youth as previously field-tested algorithms which surveyed the entire network. This is the first real-world study of a network sampling algorithm for influence maximization. Simulation results on real-world networks also support our claims.

AAMAS Conference 2018 Conference Paper

Equilibrium Refinement in Security Games with Arbitrary Scheduling Constraints

  • Kai Wang
  • Qingyu Guo
  • Phebe Vayanos
  • Milind Tambe
  • Bo An

Significant research effort in security games has focused in devising strategies that perform well even when the attacker deviates from optimal (rational) behavior. In most of these frameworks, a price needs to be paid to ensure robustness against this unpredictability. However, equilibrium refinement is an attractive alternative to boost solution robustness at no cost even though it has not received as much attention in security game literature. In this framework, resources are strategically allocated to secure an optimal outcome against a rational adversary while simultaneously protecting other targets to ensure good outcomes against boundedly rational or constrained attackers. Unfortunately, existing approaches for equilibrium refinement in security games cannot effectively address scheduling constraints that arise frequently in real-world applications. In this paper, we aim to fill this gap and make several key contributions. First, we show that existing approaches for equilibrium refinement can fail in the presence of scheduling constraints. Second, we investigate the properties of the best response of the attacker. Third, we leverage these properties to devise novel iterative algorithms to compute the optimally refined equilibrium, with polynomially many calls to an LP oracle for zero-sum games. Finally, we conduct extensive experimental evaluations that showcase i) the superior performance of our approach in the face of a boundedly rational attacker and ii) the attractive scalability properties of our algorithm that can solve realistic-sized instances.

AAMAS Conference 2018 Conference Paper

Inducible Equilibrium for Security Games

  • Qingyu Guo
  • Jiarui Gan
  • Fei Fang
  • Long Tran-Thanh
  • Milind Tambe
  • Bo An

Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. The SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid. To overcome this issue, inspired by the notion of inducibility and the pessimistic Stackelberg equilibrium [20, 21], this paper presents the inducible Stackelberg equilibrium (ISE), which is guaranteed to exist and avoids overoptimism as the outcome can always be induced with infinitesimal strategy deviation. Experimental evaluation unveils the significant overoptimism and sub-optimality of SSE and thus, verifies the advantage of the ISE as an alternative solution concept.

AAAI Conference 2018 Short Paper

Influence Maximization for Social Network Based Substance Abuse Prevention

  • Aida Rahmattalabi
  • Anamika Barman Adhikari
  • Phebe Vayanos
  • Milind Tambe
  • Eric Rice
  • Robin Baker

A major barrier to the personalized Human Activity Recognition using wearable sensors is that the performance of the recognition model drops significantly upon adoption of the system by new users or changes in physical/ behavioral status of users. Therefore, the model needs to be retrained by collecting new labeled data in the new context. In this study, we develop a transfer learning framework using convolutional neural networks to build a personalized activity recognition model with minimal user supervision.

AAAI Conference 2018 Conference Paper

Maximizing Influence in an Unknown Social Network

  • Bryan Wilder
  • Nicole Immorlica
  • Eric Rice
  • Milind Tambe

In many real world applications of influence maximization, practitioners intervene in a population whose social structure is initially unknown. This poses a multiagent systems challenge to act under uncertainty about how the agents are connected. We formalize this problem by introducing exploratory influence maximization, in which an algorithm queries individual network nodes (agents) to learn their links. The goal is to locate a seed set nearly as influential as the global optimum using very few queries. We show that this problem is intractable for general graphs. However, real world networks typically have community structure, where nodes are arranged in densely connected subgroups. We present the ARISEN algorithm, which leverages community structure to find an influential seed set. Experiments on real world networks of homeless youth, village populations in India, and others demonstrate ARISEN’s strong empirical performance. To formally demonstrate how ARISEN exploits community structure, we prove an approximation guarantee for ARISEN on graphs drawn from the Stochastic Block Model.

AAMAS Conference 2018 Conference Paper

Mitigating the Curse of Correlation in Security Games by Entropy Maximization

  • Haifeng Xu
  • Shaddin Dughmi
  • Milind Tambe
  • Venil Loyd Noronha

In Stackelberg security games, a defender seeks to randomly allocate limited security resources to protect critical targets from an attack. In this paper, we study a fundamental, yet underexplored, phenomenon in security games, which we term the Curse of Correlation (CoC). Specifically, we observe that there are inevitable correlations among the protection status of different targets. Such correlation is a crucial concern, especially in spatio-temporal domains like conservation area patrolling, where attackers can surveil patrollers at certain areas and then infer their patrolling routes using such correlations. To mitigate this issue, we propose to design entropy-maximizing defending strategies for spatio-temporal security games, which frequently suffer from CoC. We prove that the problem is #P-hard in general. However, it admits efficient algorithms in well-motivated special settings.

IJCAI Conference 2018 Conference Paper

Near Real-Time Detection of Poachers from Drones in AirSim

  • Elizabeth Bondi
  • Ashish Kapoor
  • Debadeepta Dey
  • James Piavis
  • Shital Shah
  • Robert Hannaford
  • Arvind Iyer
  • Lucas Joppa

The unrelenting threat of poaching has led to increased development of new technologies to combat it. One such example is the use of thermal infrared cameras mounted on unmanned aerial vehicles (UAVs or drones) to spot poachers at night and report them to park rangers before they are able to harm any animals. However, monitoring the live video stream from these conservation UAVs all night is an arduous task. Therefore, we discuss SPOT (Systematic Poacher deTector), a novel application that augments conservation drones with the ability to automatically detect poachers and animals in near real time. SPOT illustrates the feasibility of building upon state-of-the-art AI techniques, such as Faster RCNN, to address the challenges of automatically detecting animals and poachers in infrared images. This paper reports (i) the design of SPOT, (ii) efficient processing techniques to ensure usability in the field, (iii) evaluation of SPOT based on historical videos and a real-world test run by the end-users, Air Shepherd, in the field, and (iv) the use of AirSim for live demonstration of SPOT. The promising results from a field test have led to a plan for larger-scale deployment in a national park in southern Africa. While SPOT is developed for conservation drones, its design and novel techniques have wider application for automated detection from UAV videos.

AAMAS Conference 2018 Conference Paper

Optimizing Network Structure for Preventative Health

  • Bryan Wilder
  • Han Ching Ou
  • Kayla de la Haye
  • Milind Tambe

Diseases such as heart disease, stroke, or diabetes affect hundreds of millions of people. Such conditions are strongly impacted by obesity, and establishing healthy lifestyle behaviors is a critical public health challenge with many applications. Changing health behaviors is inherently a multiagent problem since people’s behavior is strongly influenced by those around them. Hence, practitioners often attempt to modify the social network of a community by adding or removing edges in ways that will lead to desirable behavior change. To our knowledge, no previous work considers the algorithmic problem of finding the optimal set of edges to add and remove. We propose the RECONNECT algorithm, which efficiently finds high-quality solutions for a range of different network intervention problems. We evaluate RECONNECT in a highly realistic simulated environment based on the Antelope Valley region in California which draws on demographic, social, and health-related data. We find the RECONNECT outperforms an array of baseline policies, in some cases yielding a 150% improvement over the best alternative.

AAMAS Conference 2018 Conference Paper

Please be an Influencer? Contingency-Aware Influence Maximization

  • Amulya Yadav
  • Ritesh Noothigattu
  • Eric Rice
  • Laura Onasch-Vera
  • Leandro Soriano Marcolino
  • Milind Tambe

Most previous work on influence maximization in social networks assumes that the chosen influencers (or seed nodes) can be influenced with certainty (i. e. , with no contingencies). In this paper, we focus on using influence maximization in public health domains for assisting low-resource communities, where contingencies are common. It is very difficult in these domains to ensure that the seed nodes are influenced, as influencing them entails contacting/convincing them to attend training sessions, which may not always be possible. Unfortunately, previous state-of-the-art algorithms for influence maximization are unusable in this setting. This paper tackles this challenge via the following four contributions: (i) we propose the Contingency Aware Influence Maximization problem and analyze it theoretically; (ii) we cast this problem as a Partially Observable Markov Decision Process and propose CAIMS (a novel POMDP planner) to solve it, which leverages a natural action space factorization associated with real-world social networks; and (iii) we provide extensive simulation results to compare CAIMS with existing state-of-the-art influence maximization algorithms. Finally, (iv) we provide results from a real-world feasibility trial conducted to evaluate CAIMS, in which key influencers in homeless youth social networks were influenced in order to spread awareness about HIV.

AAAI Conference 2018 Conference Paper

Policy Learning for Continuous Space Security Games Using Neural Networks

  • Nitin Kamra
  • Umang Gupta
  • Fei Fang
  • Yan Liu
  • Milind Tambe

A wealth of algorithms centered around (integer) linear programming have been proposed to compute equilibrium strategies in security games with discrete states and actions. However, in practice many domains possess continuous state and action spaces. In this paper, we consider a continuous space security game model with infinite-size action sets for players and present a novel deep learning based approach to extend the existing toolkit for solving security games. Specifically, we present (i) OptGradFP, a novel and general algorithm that searches for the optimal defender strategy in a parameterized continuous search space, and can also be used to learn policies over multiple game states simultaneously; (ii) OptGradFP-NN, a convolutional neural network based implementation of OptGradFP for continuous space security games. We demonstrate the potential to predict good defender strategies via experiments and analysis of OptGradFP and OptGradFP-NN on discrete and continuous game settings.

AAAI Conference 2018 Conference Paper

Preventing Infectious Disease in Dynamic Populations Under Uncertainty

  • Bryan Wilder
  • Sze-Chuan Suen
  • Milind Tambe

Treatable infectious diseases are a critical challenge for public health. Outreach campaigns can encourage undiagnosed patients to seek treatment but must be carefully targeted to make the most efficient use of limited resources. We present an algorithm to optimally allocate limited outreach resources among demographic groups in the population. The algorithm uses a novel multiagent model of disease spread which both captures the underlying population dynamics and is amenable to optimization. Our algorithm extends, with provable guarantees, to a stochastic setting where we have only a distribution over parameters such as the contact pattern between agents. We evaluate our algorithm on two instances where this distribution is inferred from real world data: tuberculosis in India and gonorrhea in the United States. Our algorithm produces a policy which is predicted to avert an average of least 8, 000 person-years of tuberculosis and 20, 000 personyears of gonorrhea annually compared to current policy.

AAAI Conference 2018 Conference Paper

SPOT Poachers in Action: Augmenting Conservation Drones With Automatic Detection in Near Real Time

  • Elizabeth Bondi
  • Fei Fang
  • Mark Hamilton
  • Debarun Kar
  • Donnabell Dmello
  • Jongmoo Choi
  • Robert Hannaford
  • Arvind Iyer

The unrelenting threat of poaching has led to increased development of new technologies to combat it. One such example is the use of long wave thermal infrared cameras mounted on unmanned aerial vehicles (UAVs or drones) to spot poachers at night and report them to park rangers before they are able to harm animals. However, monitoring the live video stream from these conservation UAVs all night is an arduous task. Therefore, we build SPOT (Systematic POacher deTector), a novel application that augments conservation drones with the ability to automatically detect poachers and animals in near real time. SPOT illustrates the feasibility of building upon state-of-the-art AI techniques, such as Faster RCNN, to address the challenges of automatically detecting animals and poachers in infrared images. This paper reports (i) the design and architecture of SPOT, (ii) a series of efforts towards more robust and faster processing to make SPOT usable in the field and provide detections in near real time, and (iii) evaluation of SPOT based on both historical videos and a real-world test run by the end users in the field. The promising results from the test in the field have led to a plan for larger-scale deployment in a national park in Botswana. While SPOT is developed for conservation drones, its design and novel techniques have wider application for automated detection from UAV videos.

IJCAI Conference 2018 Conference Paper

Stackelberg Security Games: Looking Beyond a Decade of Success

  • Arunesh Sinha
  • Fei Fang
  • Bo An
  • Christopher Kiekintveld
  • Milind Tambe

The Stackelberg Security Game (SSG) model has been immensely influential in security research since it was introduced roughly a decade ago. Furthermore, deployed SSG-based applications are one of most successful examples of game theory applications in the real world. We present a broad survey of recent technical advances in SSG and related literature, and then look to the future by highlighting the new potential applications and open research problems in SSG.

AAAI Conference 2018 Conference Paper

Strategic Coordination of Human Patrollers and Mobile Sensors With Signaling for Security Games

  • Haifeng Xu
  • Kai Wang
  • Phebe Vayanos
  • Milind Tambe

Traditional security games concern the optimal randomized allocation of human patrollers, who can directly catch attackers or interdict attacks. Motivated by the emerging application of utilizing mobile sensors (e. g. , UAVs) for patrolling, in this paper we propose the novel Sensor-Empowered security Game (SEG) model which captures the joint allocation of human patrollers and mobile sensors. Sensors differ from patrollers in that they cannot directly interdict attacks, but they can notify nearby patrollers (if any). Moreover, SEGs incorporate mobile sensors’ natural functionality of strategic signaling. On the technical side, we first prove that solving SEGs is NP-hard even in zero-sum cases. We then develop a scalable algorithm SEGer based on the branch-and-price framework with two key novelties: (1) a novel MILP formulation for the slave; (2) an efficient relaxation of the problem for pruning. To further accelerate SEGer, we design a faster combinatorial algorithm for the slave problem, which is provably a constantapproximation to the slave problem in zero-sum cases and serves as a useful heuristic for general-sum SEGs. Our experiments demonstrate the significant benefit of utilizing mobile sensors.

IJCAI Conference 2018 Conference Paper

The Price of Usability: Designing Operationalizable Strategies for Security Games

  • Sara Marie Mc Carthy
  • Corine M. Laan
  • Kai Wang
  • Phebe Vayanos
  • Arunesh Sinha
  • Milind Tambe

We consider the problem of allocating scarce security resources among heterogeneous targets to thwart a possible attack. It is well known that deterministic solutions to this problem being highly predictable are severely suboptimal. To mitigate this predictability, the game-theoretic security game model was proposed which randomizes over pure (deterministic) strategies, causing confusion in the adversary. Unfortunately, such mixed strategies typically involve randomizing over a large number of strategies, requiring security personnel to be familiar with numerous protocols, making them hard to operationalize. Motivated by these practical considerations, we propose an easy to use approach for computing strategies that are easy to operationalize and that bridge the gap between the static solution and the optimal mixed strategy. These strategies only randomize over an optimally chosen subset of pure strategies whose cardinality is selected by the defender, enabling them to conveniently tune the trade-off between ease of operationalization and efficiency using a single design parameter. We show that the problem of computing such operationalizable strategies is NP-hard, formulate it as a mixed-integer optimization problem, provide an algorithm for computing epsilon-optimal equilibria, and an efficient heuristic. We evaluate the performance of our approach on the problem of screening for threats at airport checkpoints and show that the Price of Usability, i. e. , the loss in optimality to obtain a strategy that is easier to operationalize, is typically not high.

AAMAS Conference 2017 Conference Paper

Cloudy with a Chance of Poaching: Adversary Behavior Modeling and Forecasting with Real-World Poaching Data

  • Debarun Kar
  • Benjamin Ford
  • Shahrzad Gholami
  • Fei Fang
  • Andrew Plumptre
  • Milind Tambe
  • Margaret Driciru
  • Fred Wanyama

Wildlife conservation organizations task rangers to deter and capture wildlife poachers. Since rangers are responsible for patrolling vast areas, adversary behavior modeling can help more effectively direct future patrols. In this innovative application track paper, we present an adversary behavior modeling system, INTER- CEPT (INTERpretable Classification Ensemble to Protect Threatened species), and provide the most extensive evaluation in the AI literature of one of the largest poaching datasets from Queen Elizabeth National Park (QENP) in Uganda, comparing INTERCEPT with its competitors; we also present results from a month-long test of INTERCEPT in the field. We present three major contributions. First, we present a paradigm shift in modeling and forecasting wildlife poacher behavior. Some of the latest work in the AI literature (and in Conservation) has relied on models similar to the Quantal Response model from Behavioral Game Theory for poacher behavior prediction. In contrast, INTERCEPT presents a behavior model based on an ensemble of decision trees (i) that more effectively predicts poacher attacks and (ii) that is more effectively interpretable and verifiable. We augment this model to account for spatial correlations and construct an ensemble of the best models, significantly improving performance. Second, we conduct an extensive evaluation on the QENP dataset, comparing 41 models in prediction performance over two years. Third, we present the results of deploying INTERCEPT for a one-month field test in QENP - a first for adversary behavior modeling applications in this domain. This field test has led to finding a poached elephant and more than a dozen snares (including a roll of elephant snares) before they were deployed, potentially saving the lives of multiple animals - including elephants.

IJCAI Conference 2017 Conference Paper

Don't Bury your Head in Warnings: A Game-Theoretic Approach for Intelligent Allocation of Cyber-security Alerts

  • Aaron Schlenker
  • Haifeng Xu
  • Mina Guirguis
  • Christopher Kiekintveld
  • Arunesh Sinha
  • Milind Tambe
  • Solomon Sonya
  • Darryl Balderas

In recent years, there have been a number of successful cyber attacks on enterprise networks by malicious actors which have caused severe damage. These networks have Intrusion Detection and Prevention Systems in place to protect them, but they are notorious for producing a high volume of alerts. These alerts must be investigated by cyber analysts to determine whether they are an attack or benign. Unfortunately, there are magnitude more alerts generated than there are cyber analysts to investigate them. This trend is expected to continue into the future creating a need for tools which find optimal assignments of the incoming alerts to analysts in the presence of a strategic adversary. We address this challenge with the four following contributions: (1) a cyber screening game (CSG) model for the cyber network protection domain, (2) an NP-hardness proof for computing the optimal strategy for the defender, (3) an algorithm that finds the optimal allocation of experts to alerts in the CSG, and (4) heuristic improvements for computing allocations in CSGs that accomplishes significant scale-up which we show empirically to closely match the solution quality of the optimal algorithm.

IJCAI Conference 2017 Conference Paper

Maximizing Awareness about HIV in Social Networks of Homeless Youth with Limited Information

  • Amulya Yadav
  • Hau Chan
  • Albert Xin Jiang
  • Haifeng Xu
  • Eric Rice
  • Milind Tambe

This paper presents HEALER, a software agent that recommends sequential intervention plans for use by homeless shelters, who organize these interventions to raise awareness about HIV among homeless youth. HEALER's sequential plans (built using knowledge of social networks of homeless youth) choose intervention participants strategically to maximize influence spread, while reasoning about uncertainties in the network. While previous work presents influence maximizing techniques to choose intervention participants, they do not address two real-world issues: (i) they completely fail to scale up to real-world sizes; and (ii) they do not handle deviations in execution of intervention plans. HEALER handles these issues via two major contributions: (i) HEALER casts this influence maximization problem as a POMDP and solves it using a novel planner which scales up to previously unsolvable real-world sizes; and (ii) HEALER allows shelter officials to modify its recommendations, and updates its future plans in a deviation-tolerant manner. HEALER was deployed in the real world in Spring 2016 with considerable success.

IJCAI Conference 2017 Conference Paper

Staying Ahead of the Game: Adaptive Robust Optimization for Dynamic Allocation of Threat Screening Resources

  • Sara Marie Mc Carthy
  • Phebe Vayanos
  • Milind Tambe

We consider the problem of dynamically allocating screening resources of different efficacies (e. g. , magnetic or X-ray imaging) at checkpoints (e. g. , at airports or ports) to successfully avert an attack by one of the screenees. Previously, the Threat Screening Game model was introduced to address this problem under the assumption that screenee arrival times are perfectly known. In reality, arrival times are uncertain, which severely impedes the implementability and performance of this approach. We thus propose a novel framework for dynamic allocation of threat screening resources that explicitly accounts for uncertainty in the screenee arrival times. We model the problem as a multistage robust optimization problem and propose a tractable solution approach using compact linear decision rules combined with robust reformulation and constraint randomization. We perform extensive numerical experiments which showcase that our approach outperforms (a) exact solution methods in terms of tractability, while incurring only a very minor loss in optimality, and (b) methods that ignore uncertainty in terms of both feasibility and optimality.

AAMAS Conference 2017 Conference Paper

Uncharted but not Uninfluenced: Influence Maximization with an Uncertain Network

  • Bryan Wilder
  • Amulya Yadav
  • Nicole Immorlica
  • Eric Rice
  • Milind Tambe

This paper focuses on new challenges in influence maximization inspired by non-profits’ use of social networks to effect behavioral change in their target populations. Influence maximization is a multiagent problem where the challenge is to select the most influential agents from a population connected by a social network. Specifically, our work is motivated by the problem of spreading messages about HIV prevention among homeless youth using their social network. We show how to compute solutions which are provably close to optimal when the parameters of the influence process are unknown. We then extend our algorithm to a dynamic setting where information about the network is revealed at each stage. Simulation experiments using real world networks collected by the homeless shelter show the advantages of our approach.

AAMAS Conference 2016 Conference Paper

CAPTURE: A New Predictive Anti-Poaching Tool for Wildlife Protection

  • Thanh H. Nguyen
  • Arunesh Sinha
  • Shahrzad Gholami
  • Andrew Plumptre
  • Lucas Joppa
  • Milind Tambe
  • Margaret Driciru
  • Fred Wanyama

Wildlife poaching presents a serious extinction threat to many animal species. Agencies (“defenders”) focused on protecting such animals need tools that help analyze, model and predict poacher activities, so they can more effectively combat such poaching; such tools could also assist in planning effective defender patrols, building on the previous security games research. To that end, we have built a new predictive anti-poaching tool, CAPTURE (Comprehensive Anti-Poaching tool with Temporal and observation Uncertainty REasoning). CAPTURE provides four main contributions. First, CAPTURE’s modeling of poachers provides significant advances over previous models from behavioral game theory and conservation biology. This accounts for: (i) the defender’s imperfect detection of poaching signs; (ii) complex temporal dependencies in the poacher’s behaviors; (iii) lack of knowledge of numbers of poachers. Second, we provide two new heuristics: parameter separation and target abstraction to reduce the computational complexity in learning the poacher models. Third, we present a new game-theoretic algorithm for computing the defender’s optimal patrolling given the complex poacher model. Finally, we present detailed models and analysis of realworld poaching data collected over 12 years in Queen Elizabeth National Park in Uganda to evaluate our new model’s prediction accuracy. This paper thus presents the largest dataset of real-world defender-adversary interactions analyzed in the security games literature. CAPTURE will be tested in Uganda in early 2016.

AAAI Conference 2016 Conference Paper

Deploying PAWS to Combat Poaching: Game-Theoretic Patrolling in Areas with Complex Terrain (Demonstration)

  • Fei Fang
  • Thanh Nguyen
  • Rob Pickles
  • Wai Lam
  • Gopalasamy Clements
  • Bo An
  • Amandeep Singh
  • Milind Tambe

The conservation of key wildlife species such as tigers and elephants are threatened by poaching activities. In many conservation areas, foot patrols are conducted to prevent poaching but they may not be well-planned to make the best use of the limited patrolling resources. While prior work has introduced PAWS (Protection Assistant for Wildlife Security) as a game-theoretic decision aid to design effective foot patrol strategies to protect wildlife, the patrol routes generated by PAWS may be difficult to follow in areas with complex terrain. Subsequent research has worked on the significant evolution of PAWS, from an emerging application to a regularly deployed software. A key advance of the deployed version of PAWS is that it incorporates the complex terrain information and generates a strategy consisting of easy-to-follow routes. In this demonstration, we provide 1) a video introducing the PAWS system; 2) an interactive visualization of the patrol routes generated by PAWS in an example area with complex terrain; and 3) a machine-human competition in designing patrol strategy given complex terrain and animal distribution.

JAAMAS Journal 2016 Journal Article

Every team deserves a second chance: an extended study on predicting team performance

  • Leandro Soriano Marcolino
  • Aravind S. Lakshminarayanan
  • Milind Tambe

Abstract Voting among different agents is a powerful tool in problem solving, and it has been widely applied to improve the performance in finding the correct answer to complex problems. We present a novel benefit of voting, that has not been observed before: we can use the voting patterns to assess the performance of a team and predict their final outcome. This prediction can be executed at any moment during problem-solving and it is completely domain independent. Hence, it can be used to identify when a team is failing, allowing an operator to take remedial procedures (such as changing team members, the voting rule, or increasing the allocation of resources). We present three main theoretical results: (1) we show a theoretical explanation of why our prediction method works; (2) contrary to what would be expected based on a simpler explanation using classical voting models, we show that we can make accurate predictions irrespective of the strength (i. e. , performance) of the teams, and that in fact, the prediction can work better for diverse teams composed of different agents than uniform teams made of copies of the best agent; (3) we show that the quality of our prediction increases with the size of the action space. We perform extensive experimentation in two different domains: Computer Go and Ensemble Learning. In Computer Go, we obtain high quality predictions about the final outcome of games. We analyze the prediction accuracy for three different teams with different levels of diversity and strength, and show that the prediction works significantly better for a diverse team. Additionally, we show that our method still works well when trained with games against one adversary, but tested with games against another, showing the generality of the learned functions. Moreover, we evaluate four different board sizes, and experimentally confirm better predictions in larger board sizes. We analyze in detail the learned prediction functions, and how they change according to each team and action space size. In order to show that our method is domain independent, we also present results in Ensemble Learning, where we make online predictions about the performance of a team of classifiers, while they are voting to classify sets of items. We study a set of classical classification algorithms from machine learning, in a data-set of hand-written digits, and we are able to make high-quality predictions about the final performance of two different teams. Since our approach is domain independent, it can be easily applied to a variety of other domains.

AAAI Conference 2016 Conference Paper

From the Lab to the Classroom and Beyond: Extending a Game-Based Research Platform for Teaching AI to Diverse Audiences

  • Nicole Sintov
  • Debarun Kar
  • Thanh Nguyen
  • Fei Fang
  • Kevin Hoffman
  • Arnaud Lyet
  • Milind Tambe

Recent years have seen increasing interest in AI from outside the AI community. This is partly due to applications based on AI that have been used in real-world domains, for example, the successful deployment of game theory-based decision aids in security domains. This paper describes our teaching approach for introducing the AI concepts underlying security games to diverse audiences. We adapted a game-based research platform that served as a testbed for recent research advances in computational game theory into a set of interactive role-playing games. We guided learners in playing these games as part of our teaching strategy, which also included didactic instruction and interactive exercises on broader AI topics. We describe our experience in applying this teaching approach to diverse audiences, including students of an urban public high school, university undergraduates, and security domain experts who protect wildlife. We evaluate our approach based on results from the games and participant surveys.

ECAI Conference 2016 Conference Paper

Get Me to My GATE on Time: Efficiently Solving General-Sum Bayesian Threat Screening Games

  • Aaron Schlenker
  • Matthew Brown 0002
  • Arunesh Sinha
  • Milind Tambe
  • Ruta Mehta

Threat Screening Games (TSGs) are used in domains where there is a set of individuals or objects to screen with a limited amount of screening resources available to screen them. TSGs are broadly applicable to domains like airport passenger screening, stadium screening, cargo container screening, etc. Previous work on TSGs focused only on the Bayesian zero-sum case and provided the MGA algorithm to solve these games. In this paper, we solve Bayesian general-sum TSGs which we prove are NP-hard even when exploiting a compact marginal representation. We also present an algorithm based upon a adversary type hierarchical tree decomposition and an efficient branch-and-bound search to solve Bayesian generalsum TSGs. With this we provide four contributions: (1) GATE, the first algorithm for solving Bayesian general-sum TSGs, which uses hierarchical type trees and a novel branch-and-bound search, (2) the Branch-and-Guide approach which combines branch-and-bound search with the MGA algorithm for the first time, (3) heuristics based on properties of TSGs for accelerated computation of GATE, and (4) experimental results showing the scalability of GATE needed for real-world domains.

AAAI Conference 2016 Conference Paper

One Size Does Not Fit All: A Game-Theoretic Approach for Dynamically and Effectively Screening for Threats

  • Matthew Brown
  • Arunesh Sinha
  • Aaron Schlenker
  • Milind Tambe

An effective way of preventing attacks in secure areas is to screen for threats (people, objects) before entry, e. g. , screening of airport passengers. However, screening every entity at the same level may be both ineffective and undesirable. The challenge then is to find a dynamic approach for randomized screening, allowing for more effective use of limited screening resources, leading to improved security. We address this challenge with the following contributions: (1) a threat screening game (TSG) model for general screening domains; (2) an NP-hardness proof for computing the optimal strategy of TSGs; (3) a scheme for decomposing TSGs into subgames to improve scalability; (4) a novel algorithm that exploits a compact game representation to efficiently solve TSGs, providing the optimal solution under certain conditions; and (5) an empirical comparison of our proposed algorithm against the current state-of-the-art optimal approach for large-scale game-theoretic resource allocation problems.

AAMAS Conference 2016 Conference Paper

Restless Poachers: Handling Exploration-Exploitation Tradeoffs in Security Domains

  • Yundi Qian
  • Chao Zhang
  • Bhaskar Krishnamachari
  • Milind Tambe

The success of Stackelberg Security Games (SSGs) in counterterrorism domains has inspired researchers’ interest in applying game-theoretic models to other security domains with frequent interactions between defenders and attackers, e. g. , wildlife protection. Previous research optimizes defenders’ strategies by modeling this problem as a repeated Stackelberg game, capturing the special property in this domain — frequent interactions between defenders and attackers. However, this research fails to handle exploration-exploitation tradeoff in this domain caused by the fact that defenders only have knowledge of attack activities at targets they protect. This paper addresses this shortcoming and provides the following contributions: (i) We formulate the problem as a restless multi-armed bandit (RMAB) model to address this challenge. (ii) To use Whittle index policy to plan for patrol strategies in the RMAB, we provide two sufficient conditions for indexability and an algorithm to numerically evaluate indexability. (iii) Given indexability, we propose a binary search based algorithm to find Whittle index policy efficiently.

AAMAS Conference 2016 Conference Paper

Signaling in Bayesian Stackelberg Games

  • Haifeng Xu
  • Rupert Freeman
  • Vincent Conitzer
  • Shaddin Dughmi
  • Milind Tambe

Algorithms for solving Stackelberg games are used in an ever-growing variety of real-world domains. Previous work has extended this framework to allow the leader to commit not only to a distribution over actions, but also to a scheme for stochastically signaling information about these actions to the follower. This can result in higher utility for the leader. In this paper, we extend this methodology to Bayesian games, in which either the leader or the follower has payoff-relevant private information or both. This leads to novel variants of the model, for example by imposing an incentive compatibility constraint for each type to listen to the signal intended for it. We show that, in contrast to previous hardness results for the case without signaling [5, 16], we can solve unrestricted games in time polynomial in their natural representation. For security games, we obtain hardness results as well as efficient algorithms, depending on the settings. We show the benefits of our approach in experimental evaluations of our algorithms.

AAMAS Conference 2016 Conference Paper

SPECTRE: A Game Theoretic Framework for Preventing Collusion in Security Games (Demonstration)

  • Shahrzad Gholami
  • Bryan Wilder
  • Matthew Brown
  • Arunesh Sinha
  • Nicole Sintov
  • Milind Tambe

Several models have been proposed for Stackelberg security games (SSGs) and protection against perfectly rational and bounded rational adversaries; however, none of these existing models addressed the destructive cooperation mechanism between adversaries. SPECTRE (Strategic Patrol planner to Extinguish Collusive ThREats) takes into account the synergistic destructive collusion among two groups of adversaries in security games. This framework is designed for the purpose of efficient patrol scheduling for security agents in security games in presence of collusion and is mainly build up on game theoretic approaches, optimization techniques, machine learning methods and theories for human decision making under risk. The major advantage of SPECTRE is involving real world data from human subject experiments with participants on Amazon Mechanical Turk (AMT).

IJCAI Conference 2016 Conference Paper

Three Strategies to Success: Learning Adversary Models in Security Games

  • Nika Haghtalab
  • Fei Fang
  • Thanh H. Nguyen
  • Arunesh Sinha
  • Ariel D. Procaccia
  • Milind Tambe

State-of-the-art applications of Stackelberg security games - including wildlife protection - offer a wealth of data, which can be used to learn the behavior of the adversary. But existing approaches either make strong assumptions about the structure of the data, or gather new data through online algorithms that are likely to play severely suboptimal strategies. We develop a new approach to learning the parameters of the behavioral model of a bounded rational attacker (thereby pinpointing a near optimal strategy), by observing how the attacker responds to only three defender strategies. We also validate our approach using experiments on real and synthetic data.

ECAI Conference 2016 Conference Paper

Toward Addressing Collusion Among Human Adversaries in Security Games

  • Shahrzad Gholami
  • Bryan Wilder
  • Matthew Brown 0002
  • Dana Thomas
  • Nicole D. Sintov
  • Milind Tambe

Security agencies including the US Coast Guard, the Federal Air Marshal Service and the Los Angeles Airport police are several major domains that have been deploying Stackelberg security games and related algorithms to protect against a single adversary or multiple, independent adversaries strategically. However, there are a variety of real-world security domains where adversaries may benefit from colluding in their actions against the defender. Given the potential negative effect of these collusive actions, the defender has an incentive to break up collusion by playing off the self-interest of individual adversaries. This paper deals with problem of collusive security games for rational and bounded rational adversaries. The theoretical results verified with human subject experiments showed that behavior model which optimizes against bounded rational adversaries provides demonstrably better performing defender strategies against human subjects.

AAMAS Conference 2016 Conference Paper

Using Abstractions to Solve Opportunistic Crime Security Games at Scale

  • Chao Zhang
  • Victor Bucarey
  • Ayan Mukhopadhyay
  • Arunesh Sinha
  • Yundi Qian
  • Yevgeniy Vorobeychik
  • Milind Tambe

In this paper, we aim to deter urban crime by recommending optimal police patrol strategies against opportunistic criminals in large scale urban problems. While previous work has tried to learn criminals’ behavior from real world data and generate patrol strategies against opportunistic crimes, it cannot scale up to large-scale urban problems. Our first contribution is a game abstraction framework that can handle opportunistic crimes in large-scale urban areas. In this game abstraction framework, we model the interaction between officers and opportunistic criminals as a game with discrete targets. By merging similar targets, we obtain an abstract game with fewer total targets. We use real world data to learn and plan against opportunistic criminals in this abstract game, and then propagate the results of this abstract game back to the original game. Our second contribution is the layer-generating algorithm used to merge targets as described in the framework above. This algorithm applies a mixed integer linear program (MILP) to merge similar and geographically neighboring targets in the large scale problem. As our third contribution, we propose a planning algorithm that recommends a mixed strategy against opportunistic criminals. Finally, our fourth contribution is a heuristic propagation model to handle the problem of limited data we occasionally encounter in largescale problems. As part of our collaboration with local police departments, we apply our model in two large scale urban problems: a university campus and a city. Our approach provides high prediction accuracy in the real datasets; furthermore, we project significant crime rate reduction using our planning strategy compared to current police strategy.

AAAI Conference 2015 Conference Paper

Combining Compact Representation and Incremental Generation in Large Games with Sequential Strategies

  • Branislav Bosansky
  • Albert Xin Jiang
  • Milind Tambe
  • Christopher Kiekintveld

Many search and security games played on a graph can be modeled as normal-form zero-sum games with strategies consisting of sequences of actions. The size of the strategy space provides a computational challenge when solving these games. This complexity is tackled either by using the compact representation of sequential strategies and linear programming, or by incremental strategy generation of iterative double-oracle methods. In this paper, we present novel hybrid of these two approaches: compact-strategy doubleoracle (CS-DO) algorithm that combines the advantages of the compact representation with incremental strategy generation. We experimentally compare CS-DO with the standard approaches and analyze the impact of the size of the support on the performance of the algorithms. Results show that CS-DO dramatically improves the convergence rate in games with non-trivial support.

AAAI Conference 2015 Conference Paper

Exploring Information Asymmetry in Two-Stage Security Games

  • Haifeng Xu
  • Zinovi Rabinovich
  • Shaddin Dughmi
  • Milind Tambe

Stackelberg security games have been widely deployed to protect real-world assets. The main solution concept there is the Strong Stackelberg Equilibrium (SSE), which optimizes the defender’s random allocation of limited security resources. However, solely deploying the SSE mixed strategy has limitations. In the extreme case, there are security games in which the defender is able to defend all the assets “almost perfectly” at the SSE, but she still sustains significant loss. In this paper, we propose an approach for improving the defender’s utility in such scenarios. Perhaps surprisingly, our approach is to strategically reveal to the attacker information about the sampled pure strategy. Specifically, we propose a two-stage security game model, where in the first stage the defender allocates resources and the attacker selects a target to attack, and in the second stage the defender strategically reveals local information about that target, potentially deterring the attacker’s attack plan. We then study how the defender can play optimally in both stages. We show, theoretically and experimentally, that the two-stage security game model allows the defender to achieve strictly better utility than SSE.

IJCAI Conference 2015 Conference Paper

Security Games with Information Leakage: Modeling and Computation

  • Haifeng Xu
  • Albert Xing Jiang
  • Arunesh Sinha
  • Zinovi Rabinovich
  • Shaddin Dughmi
  • Milind Tambe

Most models of Stackelberg security games assume that the attacker only knows the defender’s mixed strategy, but is not able to observe (even partially) the instantiated pure strategy. Such partial observation of the deployed pure strategy – an issue we refer to as information leakage – is a significant concern in practical applications. While previous research on patrolling games has considered the attacker’s real-time surveillance, our settings, therefore models and techniques, are fundamentally different. More specifically, after describing the information leakage model, we start with an LP formulation to compute the defender’s optimal strategy in the presence of leakage. Perhaps surprisingly, we show that a key subproblem to solve this LP (more precisely, the defender oracle) is NP-hard even for the simplest of security game models. We then approach the problem from three possible directions: efficient algorithms for restricted cases, approximation algorithms, and heuristic algorithms for sampling that improves upon the status quo. Our experiments confirm the necessity of handling information leakage and the advantage of our algorithms.

IJCAI Conference 2015 Conference Paper

When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing

  • Fei Fang
  • Peter Stone
  • Milind Tambe

Building on the successful applications of Stackelberg Security Games (SSGs) to protect infrastructure, researchers have begun focusing on applying game theory to green security domains such as protection of endangered animals and fish stocks. Previous efforts in these domains optimize defender strategies based on the standard Stackelberg assumption that the adversaries become fully aware of the defender’s strategy before taking action. Unfortunately, this assumption is inappropriate since adversaries in green security domains often lack the resources to fully track the defender strategy. This paper (i) introduces Green Security Games (GSGs), a novel game model for green security domains with a generalized Stackelberg assumption; (ii) provides algorithms to plan effective sequential defender strategies — such planning was absent in previous work; (iii) proposes a novel approach to learn adversary models that further improves defender performance; and (iv) provides detailed experimental analysis of proposed approaches.

ICAPS Conference 2014 Conference Paper

Computing Solutions in Infinite-Horizon Discounted Adversarial Patrolling Games

  • Yevgeniy Vorobeychik
  • Bo An 0001
  • Milind Tambe
  • Satinder Singh 0001

Stackelberg games form the core of a number of tools deployed for computing optimal patrolling strategies in adversarial domains, such as the US Federal Air Marshall Service and the US Coast Guard. In traditional Stackelberg security game models the attacker knows only the probability that each target is covered by the defender, but is oblivious to the detailed timing of the coverage schedule. In many real-world situations, however, the attacker can observe the current location of the defender and can exploit this knowledge to reason about the defender’s future moves. We show that this general modeling framework can be captured using adversarial patrolling games (APGs) in which the defender sequentially moves between targets, with moves constrained by a graph, while the attacker can observe the defender’s current location and his (stochastic) policy concerning future moves. We offer a very general model of infinite-horizon discounted adversarial patrolling games. Our first contribution is to show that defender policies that condition only on the previous defense move (i. e. , Markov stationary policies) can be arbitrarily suboptimal for general APGs. We then offer a mixed-integer non-linear programming (MINLP) formulation for computing optimal randomized policies for the defender that can condition on history of bounded, but arbitrary, length, as well as a mixed-integer linear programming (MILP) formulation to approximate these, with provable quality guarantees. Additionally, we present a non-linear programming (NLP) formulation for solving zero-sum APGs. We show experimentally that MILP significantly outperforms the MINLP formulation, and is, in turn, significantly outperformed by the NLP specialized to zero-sum games.

NeurIPS Conference 2014 Conference Paper

Diverse Randomized Agents Vote to Win

  • Albert Jiang
  • Leandro Soriano Marcolino
  • Ariel Procaccia
  • Tuomas Sandholm
  • Nisarg Shah
  • Milind Tambe

We investigate the power of voting among diverse, randomized software agents. With teams of computer Go agents in mind, we develop a novel theoretical model of two-stage noisy voting that builds on recent work in machine learning. This model allows us to reason about a collection of agents with different biases (determined by the first-stage noise models), which, furthermore, apply randomized algorithms to evaluate alternatives and produce votes (captured by the second-stage noise models). We analytically demonstrate that a uniform team, consisting of multiple instances of any single agent, must make a significant number of mistakes, whereas a diverse team converges to perfection as the number of agents grows. Our experiments, which pit teams of computer Go agents against strong agents, provide evidence for the effectiveness of voting when agents are diverse.

AAAI Conference 2014 Conference Paper

Give a Hard Problem to a Diverse Team: Exploring Large Action Spaces

  • Leandro Soriano Marcolino
  • Haifeng Xu
  • Albert Xin Jiang
  • Milind Tambe
  • Emma Bowring

Recent work has shown that diverse teams can outperform a uniform team made of copies of the best agent. However, there are fundamental questions that were not asked before. When should we use diverse or uniform teams? How does the performance change as the action space or the teams get larger? Hence, we present a new model of diversity for teams, that is more general than previous models. We prove that the performance of a diverse team improves as the size of the action space gets larger. Concerning the size of the diverse team, we show that the performance converges exponentially fast to the optimal one as we increase the number of agents. We present synthetic experiments that allow us to gain further insights: even though a diverse team outperforms a uniform team when the size of the action space increases, the uniform team will eventually again play better than the diverse team for a large enough action space. We verify our predictions in a system of Go playing agents, where we show a diverse team that improves in performance as the board size increases, and eventually overcomes a uniform team.

AAAI Conference 2014 Conference Paper

Regret-Based Optimization and Preference Elicitation for Stackelberg Security Games with Uncertainty

  • Thanh Nguyen
  • Amulya Yadav
  • Bo An
  • Milind Tambe
  • Craig Boutilier

Stackelberg security games (SSGs) have been deployed in a number of real-world domains. One key challenge in these applications is the assessment of attacker payoffs, which may not be perfectly known. Previous work has studied SSGs with uncertain payoffs modeled by interval uncertainty and provided maximin-based robust solutions. In contrast, in this work we propose the use of the less conservative minimax regret decision criterion for such payoff-uncertain SSGs and present the first algorithms for computing minimax regret for SSGs. We also address the challenge of preference elicitation, using minimax regret to develop the first elicitation strategies for SSGs. Experimental results validate the effectiveness of our approaches.

AAAI Conference 2014 Conference Paper

Solving Zero-Sum Security Games in Discretized Spatio-Temporal Domains

  • Haifeng Xu
  • Fei Fang
  • Albert Jiang
  • Vincent Conitzer
  • Shaddin Dughmi
  • Milind Tambe

Among the many deployment areas of Stackelberg Security games, a major area involves games played out in space and time, which includes applications in multiple mobile defender resources protecting multiple mobile targets. Previous algorithms for such spatio-temporal security games fail to scale-up and little is known of the computational complexity properties of these problems. This paper provides a novel oracle-based algorithmic framework for a systematic study of different problem variants of computing optimal (minimax) strategies in spatio-temporal security games. Our framework enables efficient computation of a minimax strategy when the problem admits a polynomial-time oracle. Furthermore, for the cases in which efficient oracles are difficult to find, we propose approximations or prove hardness results.

ECAI Conference 2014 Conference Paper

Unleashing Dec-MDPs in Security Games: Enabling Effective Defender Teamwork

  • Eric Anyung Shieh
  • Albert Xin Jiang
  • Amulya Yadav
  • Pradeep Varakantham
  • Milind Tambe

Multiagent teamwork and defender-attacker security games are two areas that are currently receiving significant attention within multiagent systems research. Unfortunately, despite the need for effective teamwork among multiple defenders, little has been done to harness the teamwork research in security games. This paper is the first to remedy this situation by integrating the powerful teamwork mechanisms offered by Dec-MDPs into security games. We offer the following novel contributions in this paper: (i) New models of security games where a defender team's pure strategy is defined as a Dec-MDP policy for addressing coordination under uncertainty; (ii) New algorithms based on column generation that enable efficient generation of mixed strategies given this new model; (iii) Handling global events during defender execution for effective teamwork; (iv) Exploration of the robustness of randomized pure strategies. The paper opens the door to a potentially new area combining computational game theory and multiagent teamwork.

AAAI Conference 2013 Conference Paper

Analyzing the Effectiveness of Adversary Modeling in Security Games

  • Thanh Nguyen
  • Rong Yang
  • Amos Azaria
  • Sarit Kraus
  • Milind Tambe

Recent deployments of Stackelberg security games (SSG) have led to two competing approaches to handle boundedly rational human adversaries: (1) integrating models of human (adversary) decision-making into the game-theoretic algorithms, and (2) applying robust optimization techniques that avoid adversary modeling. A recent algorithm (MATCH) based on the second approach was shown to outperform the leading modeling-based algorithm even in the presence of significant amount of data. Is there then any value in using human behavior models in solving SSGs? Through extensive experiments with 547 human subjects playing 11102 games in total, we emphatically answer the question in the affirmative, while providing the following key contributions: (i) we show that our algorithm, SU-BRQR, based on a novel integration of human behavior model with the subjective utility function, significantly outperforms both MATCH and its improvements; (ii) we are the first to present experimental results with security intelligence experts, and find that even though the experts are more rational than the Amazon Turk workers, SU-BRQR still outperforms an approach assuming perfect rationality (and to a more limited extent MATCH); (iii) we show the advantage of SU-BRQR in a new, large game setting and demonstrate that sufficient data enables it to improve its performance over MATCH.

IJCAI Conference 2013 Conference Paper

Defender (Mis)Coordination in Security Games

  • Albert Xin Jiang
  • Ariel D. Procaccia
  • Yundi Qian
  • Nisarg Shah
  • Milind Tambe

We study security games with multiple defenders. To achieve maximum security, defenders must perfectly synchronize their randomized allocations of resources. However, in real-life scenarios (such as protection of the port of Boston) this is not the case. Our goal is to quantify the loss incurred by miscoordination between defenders, both theoretically and empirically. We introduce two notions that capture this loss under different assumptions: the price of miscoordination, and the price of sequential commitment. Generally speaking, our theoretical bounds indicate that the loss may be extremely high in the worst case, while our simulations establish a smaller yet significant loss in practice.

AAMAS Conference 2013 Conference Paper

Diversity Beats Strength? A Hands-on Experience with 9X9 Go

  • Leandro Soriano Marcolino
  • Douglass Chen
  • Albert Xin Jiang
  • Milind Tambe

Team formation is a critical step in deploying a multi-agent team. In some scenarios, agents coordinate by voting continuously. When forming such teams, should we focus on the diversity of the team or on the strength of each member? Can a team of diverse (and weak) agents outperform a uniform team of strong agents? In this demo, the user will be able to explore these questions by playing one of the most challenging board games: Go.

IJCAI Conference 2013 Conference Paper

Efficiently Solving Joint Activity Based Security Games

  • Eric Shieh
  • Manish Jain
  • Albert Xin Jiang
  • Milind Tambe

Despite recent successful real-world deployments of Stackelberg Security Games (SSGs), scale-up remains a fundamental challenge in this field. The latest techniques do not scale-up to domains where multiple defenders must coordinate time-dependent joint activities. To address this challenge, this paper presents two branch-and-price algorithms for solving SSGs, SMARTO and SMARTH, with three novel features: (i) a column-generation approach that uses an ordered network of nodes (determined by solving the traveling salesman problem) to generate individual defender strategies; (ii) exploitation of iterative reward shaping of multiple coordinating defender units to generate coordinated strategies; (iii) generation of tighter upper-bounds for pruning by solving security games that only abide by key scheduling constraints. We provide extensive experimental results and formal analyses.

JAAMAS Journal 2013 Journal Article

Empirical evaluation of computational fear contagion models in crowd dispersions

  • Jason Tsai
  • Emma Bowring
  • Milind Tambe

Abstract In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions being influenced by surrounding people’s emotions. While the overall effect is agreed upon, the underlying mechanism of the spread of emotions has seen little quantification and application to computational agents despite extensive evidence of its impacts in everyday life. In this paper, we examine computational models of emotional contagion by implementing two models (Bosse et al. , European council on modeling and simulation, pp. 212–218, 2009 ) and Durupinar, From audiences to mobs: Crowd simulation with psychological factors, PhD dissertation, Bilkent University, 2010 ) that draw from two separate lines of contagion research: thermodynamics-based and epidemiological-based. We first perform sensitivity tests on each model in an evacuation simulation, ESCAPES, showing both models to be reasonably robust to parameter variations with certain exceptions. We then compare their ability to reproduce a real crowd panic scene in simulation, showing that the thermodynamics-style model (Bosse et al. , European council on modeling and simulation, pp. 212–218, 2009 ) produces superior results due to the ill-suited contagion mechanism at the core of epidemiological models. We also identify that a graduated effect of fear and proximity-based contagion effects are key to producing the superior results. We then reproduce the methodology on a second video, showing that the same results hold, implying generality of the conclusions reached in the first scene.

AAMAS Conference 2013 Conference Paper

Modeling Human Adversary Decision Making in Security Games: An Initial Report

  • Thanh H. Nguyen
  • James Pita
  • Rajiv Maheswaran
  • Milind Tambe
  • Amos Azaria
  • Sarit Kraus

Motivated by recent deployments of Stackelberg security games (SSGs), two competing approaches have emerged which either integrate models of human decision making into game-theoretic algorithms or apply robust optimization techniques that avoid adversary modeling. Recently, a robust technique (MATCH) has been shown to significantly outperform the leading modeling-based algorithms (e. g. , Quantal Response (QR)) even in the presence of significant amounts of subject data. As a result, the effectiveness of using human behaviors in solving SSGs remains in question. We study this question in this paper.

IJCAI Conference 2013 Conference Paper

Multi-Agent Team Formation: Diversity Beats Strength?

  • Leandro Soriano Marcolino
  • Albert Xin Jiang
  • Milind Tambe

Team formation is a critical step in deploying a multi-agent team. In some scenarios, agents coordinate by voting continuously. When forming such teams, should we focus on the diversity of the team or on the strength of each member? Can a team of diverse (and weak) agents outperform a uniform team of strong agents? We propose a new model to address these questions. Our key contributions include: (i) we show that a diverse team can overcome a uniform team and we give the necessary conditions for it to happen; (ii) we present optimal voting rules for a diverse team; (iii) we perform synthetic experiments that demonstrate that both diversity and strength contribute to the performance of a team; (iv) we show experiments that demonstrate the usefulness of our model in one of the most difficult challenges for Artificial Intelligence: Computer Go.

IJCAI Conference 2013 Conference Paper

Scaling-Up Security Games with Boundedly Rational Adversaries: A Cutting-Plane Approach

  • Rong Yang
  • Albert Xin Jiang
  • Milind Tambe
  • Fernando Ordóñez

To improve the current real-world deployments of Stackelberg security games (SSGs), it is critical now to efficiently incorporate models of adversary bounded rationality in large-scale SSGs. Unfortunately, previously proposed branch-and-price approaches fail to scale-up given the non-convexity of such models, as we show with a realization called COCOMO. Therefore, we next present a novel cutting-plane algorithm called BLADE to scale-up SSGs with complex adversary models, with three key novelties: (i) an efficient scalable separation oracle to generate deep cuts; (ii) a heuristic that uses gradient to further improve the cuts; (iii) techniques for quality-efficiency tradeoff.

AAMAS Conference 2013 Conference Paper

Security Games with Contagion: Handling Asymmetric Information

  • Jason Tsai
  • Yundi Qian
  • Yevgeniy Vorobeychik
  • Christopher Kiekintveld
  • Milind Tambe

Counterinsurgency, which is the effort to mitigate support for an opposing organization, is one such domain that has been studied recently and past work has modeled the problem as an influence blocking maximization that features an influencer and a mitigator. While past work has introduced scalable heuristic techniques for generating effective strategies using a double oracle algorithm, it has not addressed the issue of uncertainty and asymmetric information, which is the topic of this paper.

AAMAS Conference 2012 Conference Paper

A Robust Approach to Addressing Human Adversaries in Security Games

  • James Pita
  • Richard John
  • Rajiv Maheswaran
  • Milind Tambe
  • Rong Yang
  • Sarit Kraus

While game-theoretic approaches have been proposed for addressing complex security resource allocation problems, many of the standard game-theoretic assumptions fail to address human adversaries who security forces will likely face. To that end, approaches have been proposed that attempt to incorporate better models of human decision-making in these security settings. We take a new approach where instead of trying to create a model of human decisionmaking, we leverage ideas from robust optimization techniques. In addition, we extend our approach and the previous best performing approach to also address human anchoring biases under limited observation conditions. To evaluate our approach, we perform a comprehensive examination comparing the performance of our new approach against the current leading approaches to addressing human adversaries. Finally, in our experiments we take the first ever analysis of some demographic information and personality measures that may influence decision making in security games.

ECAI Conference 2012 Conference Paper

A Robust Approach to Addressing Human Adversaries in Security Games

  • James Pita
  • Richard John
  • Rajiv T. Maheswaran
  • Milind Tambe
  • Sarit Kraus

Game-theoretic approaches have been proposed for addressing the complex problem of assigning limited security resources to protect a critical set of targets. However, many of the standard assumptions fail to address human adversaries who security forces will likely face. To address this challenge, previous research has attempted to integrate models of human decision-making into the game-theoretic algorithms for security settings. The current leading approach, based on experimental evaluation, is derived from a well-founded solution concept known as quantal response and is known as BRQR. One critical difficulty with opponent modeling in general is that, in security domains, information about potential adversaries is often sparse or noisy and furthermore, the games themselves are highly complex and large in scale. Thus, we chose to examine a completely new approach to addressing human adversaries that avoids the complex task of modeling human decision-making. We leverage and modify robust optimization techniques to create a new type of optimization where the defender's loss for a potential deviation by the attacker is bounded by the distance of that deviation from the expected-value-maximizing strategy. To demonstrate the advantages of our approach, we introduce a systematic way to generate meaningful reward structures and compare our approach with BRQR in the most comprehensive investigation to date involving 104 security settings where previous work has tested only up to 10 security settings. Our experimental analysis reveals our approach performing as well as or outperforming BRQR in over 90% of the security settings tested and we demonstrate significant runtime benefits. These results are in favor of utilizing an approach based on robust optimization in these complex domains to avoid the difficulties of opponent modeling.

AAMAS Conference 2012 Conference Paper

A Unified Method for Handling Discrete and Continuous Uncertainty in Bayesian Stackelberg Games

  • Zhengyu Yin
  • Milind Tambe

Given their existing and potential real-world security applications, Bayesian Stackelberg games have received significant research interest. In these games, the defender acts as a leader, and the many different follower types model the uncertainty over discrete attacker types. Unfortunately since solving such games is an NP-hard problem, scale-up has remained a difficult challenge. This paper scales up Bayesian Stackelberg games, providing a novel unified approach to handle uncertainty not only over discrete follower types but also other key continuously distributed real world uncertainty, due to the leader's execution error, the follower's observation error, and continuous payoff uncertainty. To that end, this paper provides contributions in two parts. First, we present a new algorithm for Bayesian Stackelberg games, called \textsc{hunter}, to scale up the number of types. \textsc{hunter} combines the following five key features: i) efficient pruning via a best-first search of the leader's strategy space; ii) a novel linear program for computing tight upper bounds for this search; iii) using Bender's decomposition for solving the upper bound linear program efficiently; iv) efficient inheritance of Bender's cuts from parent to child; v) an efficient heuristic branching rule. Our experiments show that \textsc{hunter} provides orders of magnitude speedups over the best existing methods to handle discrete follower types. In the second part, we show \textsc{hunter}'s efficiency for Bayesian Stackelberg games can be exploited to also handle the continuous uncertainty using sample average approximation. We experimentally show that our \textsc{hunter}based approach also outperforms latest robust solution methods under continuously distributed uncertainty.

AAMAS Conference 2012 Conference Paper

Adversarial Patrolling Games

  • Yevgeniy Vorobeychik
  • Bo An
  • Milind Tambe

Defender-Attacker Stackelberg games are the foundations of tools deployed for computing optimal patrolling strategies in adversarial domains such as the United states Federal Air Marshals Service and the United States Coast Guard, among others. In Stackelberg game models of these systems the attacker knows only the probability that each target is covered by the defender, but is oblivious to the detailed timing of the coverage schedule. In many real-world situations, however, the attacker can observe the current location of the defender and can exploit this knowledge to reason about the defender’s future moves. We study Stackelberg security games in which the defender sequentially moves between targets, with moves constrained by an exogenously specified graph, while the attacker can observe the defender’s current location and his (stochastic) policy concerning future moves.

AAMAS Conference 2012 Conference Paper

AgentPolis: Towards a Platform for Fully Agent-based Modeling of Multi-Modal Transportation

  • Michal Jakob
  • Zbynĕk Moler
  • Anton
  • iacute; n Komenda
  • Zhengyu Yin
  • Albert Xin Jiang
  • Matthew Johnson
  • Michal Pĕchouček

\textsc{AgentPolis} is a fully agent-based platform for modeling multi-modal transportation systems. It comprises a high-performance discrete-event simulation core, a cohesive set of high-level abstractions for building extensible agent-based models and a library of predefined components frequently used in transportation and mobility models. Together with a suite of supporting tools, \textsc{AgentPolis} enables rapid prototyping and execution of data-driven simulations of a wide range of mobility and transportation phenomena. We illustrate the capabilities of the platform on a model of fare inspection in public transportation networks.

JAAMAS Journal 2012 Journal Article

An extended study on multi-objective security games

  • Matthew Brown
  • Bo An
  • Milind Tambe

Abstract The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. In such domains, decision makers have multiple competing objectives they must consider which may take different forms that are not readily comparable including safety, cost, and public perception. Thus, it can be difficult to know how to weigh the different objectives when deciding on a security strategy. To address the challenges of these domains, we propose a fundamentally different solution concept, multi-objective security games (MOSGs). Instead of a single optimal solution, MOSGs have a set of Pareto optimal (non-dominated) solutions referred to as the Pareto frontier, which can be generated by solving a sequence of constrained single-objective optimization problems (CSOPs). The Pareto frontier allows the decision maker to analyze the tradeoffs that exist between the multiple objectives. Our contributions include: (i) an algorithm, Iterative-ε-Constraints, , for generating the sequence of CSOPs; (ii) an exact approach for solving an mixed-integer linear program (MILP) formulation of a CSOP; (iii) heuristics that achieve speed up by exploiting the structure of security games to further constrain the MILP; (iv) an approximate approach for solving a CSOP built off those same heuristics, increasing the scalability of our approach with quality guarantees. Additional contributions of this paper include proofs on the level of approximation, detailed experimental evaluation of the proposed approaches and heuristics, as well as a discussion on techniques for visualizing the Pareto frontier.

AAMAS Conference 2012 Conference Paper

Computing Optimal Strategy against Quantal Response in Security Games

  • Rong Yang
  • Fernando Ord
  • oacute;
  • ntilde; ez
  • Milind Tambe

To step beyond the first-generation deployments of attacker-defender security games -- for LAX Police, US FAMS and others -- it is critical that we relax the assumption of perfect rationality of the human adversary. Indeed, this assumption is a well-accepted limitation of classical game theory and modeling human adversaries' bounded rationality is critical. To this end, quantal response (QR) has provided very promising results to model human bounded rationality. However, in computing optimal defender strategies in real-world security games against a QR model of attackers, we face difficulties including (1) solving a nonlinear non-convex optimization problem efficiently for massive real-world security games; and (2) addressing constraints on assigning security resources, which adds to the complexity of computing the optimal defender strategy. This paper presents two new algorithms to address these difficulties: \textsc{GOSAQ} can compute the globally optimal defender strategy against a QR model of attackers when there are no resource constraints and gives an efficient heuristic otherwise; \textsc{PASAQ} in turn provides an efficient approximation of the optimal defender strategy with or without resource constraints. These two novel algorithms are based on three key ideas: (i) use of a binary search method to solve the fractional optimization problem efficiently, (ii) construction of a convex optimization problem through a non-linear transformation, (iii) building a piecewise linear approximation of the non-linear terms in the problem. Additional contributions of this paper include proofs of approximation bounds, detailed experimental results showing the advantages of extsc{GOSAQ} and \textsc{PASAQ} in solution quality over the benchmark algorithm (\textsc{BRQR}) and the efficiency of \textsc{PASAQ}. Given these results, \textsc{PASAQ} is at the heart of the PROTECT system, which is deployed for the US Coast Guard in the port of Boston, and is now headed to other ports.

AAMAS Conference 2012 Conference Paper

Designing Better Strategies against Human Adversaries in Network Security Games

  • Rong Yang
  • Fei Fang
  • Albert Xin Jiang
  • Karthik Rajagopal
  • Milind Tambe
  • Rajiv Maheswaran

In a Network Security Game (NSG), security agencies must allocate limited resources to protect targets embedded in a network, such as important buildings in a city road network. A recent line of work relaxed the perfect-rationality assumption of human adversary and showed significant advantages of incorporating the bounded rationality adversary models in non-networked security domains. Given that real-world NSG are often extremely complex and hence very difficult for humans to solve, it is critical that we address human bounded rationality when designing defender strategies. To that end, the key contributions of this paper include: (i) comprehensive experiments with human subjects using a web-based game that we designed to simulate NSGs; (ii) new behavioral models of human adversary in NSGs, which we train with the data collected from human experiments; (iii) new algorithms for computing the defender optimal strategy against the new models.

AAMAS Conference 2012 Conference Paper

Detection of Suspicious Behavior from a Sparse Set of Multiagent Interactions

  • Boštjan Kaluža
  • Gal Kaminka
  • Milind Tambe

In many multiagent domains, no single observation event is sufficient to determine that the behavior of individuals is suspicious. Instead, suspiciousness must be inferred from a combination of multiple events, where events refer to the individual's interactions with other individuals. Hence, a detection system must employ a detector that combines evidence from multiple events, in contrast to most previous work, which focuses on the detection of a single, clearly suspicious event. This paper proposes a two-step detection system, where it first detects trigger events from multiagent interactions, and then combines the evidence to provide a degree of suspicion. The paper provides three key contributions: (i) proposes a novel detector that generalizes a utility-based plan recognition with arbitrary utility functions, (ii) specifies conditions that any reasonable detector should satisfy, and (iii) analyzes three detectors and compares them with the proposed approach. The results on a simulated airport domain and a dangerous-driver domain show that our new algorithm outperforms other approaches in several settings.

AAMAS Conference 2012 Conference Paper

Emotional Contagion with Virtual Characters

  • Jason Tsai
  • Emma Bowring
  • Stacy Marsella
  • Milind Tambe

In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions mimicking surrounding people’s emotions [8]. While it has been observed in humanhuman interactions, no known studies have examined its existence in agent-human interactions. As virtual characters make their way into high-risk, high-impact applications such as psychotherapy and military training with increasing frequency, the emotional impact of the agents’ expressions must be accurately understood to avoid undesirable repercussions.

AAMAS Conference 2012 Conference Paper

Game-theoretic Resource Allocation for Malicious Packet Detection in Computer Networks

  • Ondřej Vanĕk
  • Zhengyu Yin
  • Manish Jain
  • Branislav Bošansk
  • yacute;
  • Milind Tambe
  • Michal Pĕchouček

We study the problem of optimal resource allocation for packet selection and inspection to detect potential threats in large computer networks with multiple computers of differing importance. An attacker tries to harm these targets by sending malicious packets from multiple entry points of the network; the defender thus needs to optimally allocate her resources to maximize the probability of malicious packet detection under network latency constraints. We formulate the problem as a graph-based security game with multiple resources of heterogeneous capabilities and propose a mathematical program for finding optimal solutions. We also propose \textsc{Grande}, a novel polynomial time algorithm that uses an approximated utility function to circumvent the limited scalability caused by the attacker's large strategy space and the non-linearity of the aforementioned mathematical program. \textsc{Grande} computes solutions with bounded error and scales up to problems of realistic sizes.

AAMAS Conference 2012 Conference Paper

Multi-Objective Optimization for Security Games

  • Matthew Brown
  • Bo An
  • Christopher Kiekintveld
  • Fernando Ord
  • oacute;
  • ntilde; ez
  • Milind Tambe

The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. There are security domains where the payoffs for preventing the different types of adversaries may take different forms (seized money, reduced crime, saved lives, etc) which are not readily comparable. Thus, it can be difficult to know how to weigh the different payoffs when deciding on a security strategy. To address the challenges of these domains, we propose a fundamentally different solution concept, multi-objective security games (MOSG), which combines security games and multi-objective optimization. Instead of a single optimal solution, MOSGs have a set of Pareto optimal (non-dominated) solutions referred to as the Pareto frontier. The Pareto frontier can be generated by solving a sequence of constrained single-objective optimization problems (CSOP), where one objective is selected to be maximized while lower bounds are specified for the other objectives. Our contributions include: (i) an algorithm, Iterative $\epsilon$-Constraints, for generating the sequence of CSOPs; (ii) an exact approach for solving an MILP formulation of a CSOP (which also applies to multi-objective optimization in more general Stackelberg games); (iii) heuristics that achieve speedup by exploiting the structure of security games to further constrain a CSOP; (iv) an approximate approach for solving an algorithmic formulation of a CSOP, increasing the scalability of our approach with quality guarantees. Additional contributions of this paper include proofs on the level of approximation and detailed experimental evaluation of the proposed approaches.

AAAI Conference 2012 Conference Paper

Patrol Strategies to Maximize Pristine Forest Area

  • Matthew Johnson
  • Fei Fang
  • Milind Tambe

Illegal extraction of forest resources is fought, in many developing countries, by patrols that try to make this activity less profitable, using the threat of confiscation. With a limited budget, officials will try to distribute the patrols throughout the forest intelligently, in order to most effectively limit extraction. Prior work in forest economics has formalized this as a Stackelberg game, one very different in character from the discrete Stackelberg problem settings previously studied in the multiagent literature. Specifically, the leader wishes to minimize the distance by which a profit-maximizing extractor will trespass into the forest—or to maximize the radius of the remaining “pristine” forest area. The follower’s costbenefit analysis of potential trespass distances is affected by the likelihood of being caught and suffering confiscation. In this paper, we give a near-optimal patrol allocation algorithm and a 1/2-approximation algorithm, the latter of which is more efficient and yields simpler, more practical patrol allocations. Our simulations indicate that these algorithms substantially outperform existing heuristic allocations.

AAMAS Conference 2012 Conference Paper

PROTECT: A Deployed Game Theoretic System to Protect the Ports of the United States

  • Eric Shieh
  • Bo An
  • Rong Yang
  • Milind Tambe
  • Craig Baldwin
  • Joseph DiRenzo
  • Ben Maule
  • Garrett Meyer

While three deployed applications of game theory for security have recently been reported at AAMAS, we as a community remain in the early stages of these deployments; there is a continuing need to understand the core principles for innovative security applications of game theory. Towards that end, this paper presents PROTECT, a game-theoretic system deployed by the United States Coast Guard (USCG) in the port of Boston for scheduling their patrols. USCG has termed the deployment of PROTECT in Boston a success, and efforts are underway to test it in the port of New York, with the potential for nationwide deployment. PROTECT is premised on an attacker-defender Stackelberg game model and offers five key innovations. First, this system is a departure from the assumption of perfect adversary rationality noted in previous work, relying instead on a quantal response (QR) model of the adversary's behavior --- to the best of our knowledge, this is the first real-world deployment of the QR model. Second, to improve PROTECT's efficiency, we generate a compact representation of the defender's strategy space, exploiting equivalence and dominance. Third, we show how to practically model a real maritime patrolling problem as a Stackelberg game. Fourth, our experimental results illustrate that PROTECT's QR model more robustly handles real-world uncertainties than a perfect rationality model. Finally, in evaluating PROTECT, this paper for the first time provides real-world data: (i) comparison of human-generated vs PROTECT security schedules, and (ii) results from an Adversarial Perspective Team's (human mock attackers) analysis.

AAAI Conference 2012 Conference Paper

PROTECT: An Application of Computational Game Theory for the Security of the Ports of the United States

  • Eric Shieh
  • Bo An
  • Rong Yang
  • Milind Tambe
  • Craig Baldwin
  • Joseph DiRenzo
  • Ben Maule
  • Garrett Meyer

Building upon previous security applications of computational game theory, this paper presents PROTECT, a gametheoretic system deployed by the United States Coast Guard (USCG) in the port of Boston for scheduling their patrols. USCG has termed the deployment of PROTECT in Boston a success, and efforts are underway to test it in the port of New York, with the potential for nationwide deployment. PROTECT is premised on an attacker-defender Stackelberg game model and offers five key innovations. First, this system is a departure from the assumption of perfect adversary rationality noted in previous work, relying instead on a quantal response (QR) model of the adversary’s behavior — to the best of our knowledge, this is the first real-world deployment of the QR model. Second, to improve PROTECT’s efficiency, we generate a compact representation of the defender’s strategy space, exploiting equivalence and dominance. Third, we show how to practically model a real maritime patrolling problem as a Stackelberg game. Fourth, our experimental results illustrate that PROTECT’s QR model more robustly handles real-world uncertainties than a perfect rationality model. Finally, in evaluating PROTECT, this paper provides realworld data: (i) comparison of human-generated vs PROTECT security schedules, and (ii) results from an Adversarial Perspective Team’s (human mock attackers) analysis. 1

AAMAS Conference 2012 Conference Paper

SAVES: A Sustainable Multiagent Application to Conserve Building Energy Considering Occupants

  • Jun-young Kwak
  • Pradeep Varakantham
  • Rajiv Maheswaran
  • Milind Tambe
  • Farrokh Jazizadeh
  • Geoffrey Kavulya
  • Laura Klein
  • Burcin Becerik-Gerber

This paper describes an innovative multiagent system called SAVES with the goal of conserving energy in commercial buildings. We specifically focus on an application to be deployed in an existing university building that provides several key novelties: (i) jointly performed with the university facility management team, SAVES is based on actual occupant preferences and schedules, actual energy consumption and loss data, real sensors and hand-held devices, etc. ; (ii) it addresses novel scenarios that require negotiations with groups of building occupants to conserve energy; (iii) it focuses on a non-residential building, where human occupants do not have a direct financial incentive in saving energy and thus requires a different mechanism to effectively motivate occupants; and (iv) SAVES uses a novel algorithm for generating optimal MDP policies that explicitly consider multiple criteria optimization (energy and personal comfort) as well as uncertainty over occupant preferences when negotiating energy reduction - this combination of challenges has not been considered in previous MDP algorithms. In a validated simulation testbed, we show that SAVES substantially reduces the overall energy consumption compared to the existing control method while achieving comparable average satisfaction levels for occupants. As a real-world test, we provide results of a trial study where SAVES is shown to lead occupants to conserve energy in real buildings.

AAAI Conference 2012 Conference Paper

Security Games for Controlling Contagion

  • Jason Tsai
  • Thanh Nguyen
  • Milind Tambe

Many strategic actions carry a ‘contagious’ component beyond the immediate locale of the effort itself. Viral marketing and peacekeeping operations have both been observed to have a spreading effect. In this work, we use counterinsurgency as our illustrative domain. Defined as the effort to block the spread of support for an insurgency, such operations lack the manpower to defend the entire population and must focus on the opinions of a subset of local leaders. As past researchers of security resource allocation have done, we propose using game theory to develop such policies and model the interconnected network of leaders as a graph. Unlike this past work in security games, actions in these domains possess a probabilistic, non-local impact. To address this new class of security games, we combine recent research in influence blocking maximization with a double oracle approach and create novel heuristic oracles to generate mixed strategies for a real-world leadership network from Afghanistan, synthetic leadership networks, and a real social network. We find that leadership networks that exhibit highly interconnected clusters can be solved equally well by our heuristic methods, but our more sophisticated heuristics outperform simpler ones in less interconnected social networks.

AAAI Conference 2012 Conference Paper

Security Games with Limited Surveillance

  • Bo An
  • David Kempe
  • Christopher Kiekintveld
  • Eric Shieh
  • Satinder Singh
  • Milind Tambe
  • Yevgeniy Vorobeychik

Randomized first-mover strategies of Stackelberg games are used in several deployed applications to allocate limited resources for the protection of critical infrastructure. Stackelberg games model the fact that a strategic attacker can surveil and exploit the defender’s strategy, and randomization guards against the worst effects by making the defender less predictable. In accordance with the standard game-theoretic model of Stackelberg games, past work has typically assumed that the attacker has perfect knowledge of the defender’s randomized strategy and will react correspondingly. In light of the fact that surveillance is costly, risky, and delays an attack, this assumption is clearly simplistic: attackers will usually act on partial knowledge of the defender’s strategies. The attacker’s imperfect estimate could present opportunities and possibly also threats to a strategic defender. In this paper, we therefore begin a systematic study of security games with limited surveillance. We propose a natural model wherein an attacker forms or updates a belief based on observed actions, and chooses an optimal response. We investigate the model both theoretically and experimentally. In particular, we give mathematical programs to compute optimal attacker and defender strategies for a fixed observation duration, and show how to use them to estimate the attacker’s observation durations. Our experimental results show that the defender can achieve significant improvement in expected utility by taking the attacker’s limited surveillance into account, validating the motivation of our work.

AAAI Conference 2012 Conference Paper

The Deployment-to-Saturation Ratio in Security Games

  • Manish Jain
  • Kevin Leyton-Brown
  • Milind Tambe

Stackelberg security games form the backbone of systems like ARMOR, IRIS and PROTECT, which are in regular use by the Los Angeles International Police, US Federal Air Marshal Service and the US Coast Guard respectively. An understanding of the runtime required by algorithms that power such systems is critical to furthering the application of game theory to other real-world domains. This paper identifies the concept of the deployment-to-saturation ratio in random Stackelberg security games, and shows that problem instances for which this ratio is 0. 5 are computationally harder than instances with other deployment-to-saturation ratios for a wide range of different equilibrium computation methods, including (i) previously published different MIP algorithms, and (ii) different underlying solvers and solution mechanisms. This finding has at least two important implications. First, it is important for new algorithms to be evaluated on the hardest problem instances. We show that this has often not been done in the past, and introduce a publicly available benchmark suite to facilitate such comparisons. Second, we provide evidence that this computationally hard region is also one where optimization would be of most benefit to security agencies, and thus requires significant attention from researchers in this area. Furthermore, we use the concept of phase transitions to better understand this computationally hard region. We define a decision problem related to security games, and show that the probability that this problem has a solution exhibits a phase transition as the deployment-to-saturation ratio crosses 0. 5. We also demonstrate that this phase transition is invariant to changes both in the domain and the domain representation, and that the phase transition point corresponds to the computationally hardest instances.

AAMAS Conference 2011 Conference Paper

A Double Oracle Algorithm for Zero-Sum Security Games on Graphs

  • Manish Jain
  • Dmytro Korzhyk
  • Ond
  • #X159; ej Van
  • #X11b; k
  • Vincent Conitzer
  • Michal P
  • #X11b; chou

In response to the Mumbai attacks of 2008, the Mumbai police have started to schedule a limited number of inspection checkpoints on the road network throughout the city. Algorithms for similar security-related scheduling problems have been proposed in recent literature, but security scheduling in networked domains when targets have varying importance remains an open problem at large. In this paper, we cast the network security problem as an attackerdefender zero-sum game. The strategy spaces for both players are exponentially large, so this requires the development of novel, scalable techniques. We first show that existing algorithms for approximate solutions can be arbitrarily bad in general settings. We present RUGGED (Randomization in Urban Graphs by Generating strategies for Enemy and Defender), the first scalable optimal solution technique for such network security games. Our technique is based on a double oracle approach and thus does not require the enumeration of the entire strategy space for either of the players. It scales up to realistic problem sizes, as is shown by our evaluation of maps of southern Mumbai obtained from GIS data.

AAMAS Conference 2011 Conference Paper

Approximation Methods for Infinite Bayesian Stackelberg Games: Modeling Distributional Payoff Uncertainty

  • Christopher Kiekintveld
  • Janusz Marecki
  • Milind Tambe

Game theory is fast becoming a vital tool for reasoning about complex real-world security problems, including critical infrastructure protection. The game models for these applications are constructed using expert analysis and historical data to estimate the values of key parameters, including the preferences and capabilities of terrorists. In many cases, it would be natural to represent uncertainty over these parameters using continuous distributions (such as uniform intervals or Gaussians). However, existing solution algorithms are limited to considering a small, finite number of possible attacker types with different payoffs. We introduce a general model of infinite Bayesian Stackelberg security games that allows payoffs to be represented using continuous payoff distributions. We then develop several techniques for finding approximate solutions for this class of games, and show empirically that our methods offer dramatic improvements over the current state of the art, providing new ways to improve the robustness of security game models.

IJCAI Conference 2011 Conference Paper

Continuous Time Planning for Multiagent Teams with Temporal Constraints

  • Zhengyu Yin
  • Milind Tambe

Continuous state DEC-MDPs are critical for agent teams in domains involving resources such as time, but scaling them up is a significant challenge. To meet this challenge, we first introduce a novel continuous-time DEC-MDP model that exploits transition independence in domains with temporal constraints. Moreimportantly, we present a new locally optimal algorithm called SPAC. Compared to the best previous algorithm, SPAC finds solutions of comparable quality substantially faster; SPAC also scales to larger teams of agents.

AAMAS Conference 2011 Conference Paper

ESCAPES - Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social comparison

  • Jason Tsai
  • Natalie Fridman
  • Emma Bowring
  • Matthew Brown
  • Shira Epstein
  • Gal A. Kaminka
  • Stacy Marsella
  • Andrew Ogden

In creating an evacuation simulation for training and planning, realistic agents that reproduce known phenomenon are required. Evacuation simulation in the airport domain requires additional features beyond most simulations, including the unique behaviors of first-time visitors who have incomplete knowledge of the area and families that do not necessarily adhere to often-assumed pedestrian behaviors. Evacuation simulations not customized for the airport domain do not incorporate the factors important to it, leading to inaccuracies when applied to it. In this paper, we describe ESCAPES, a multiagent evacuation simulation tool that incorporates four key features: (i) different agent types; (ii) emotional interactions; (iii) informational interactions; (iv) behavioral interactions. Our simulator reproduces phenomena observed in existing studies on evacuation scenarios and the features we incorporate substantially impact escape time. We use ESCAPES to model the International Terminal at Los Angeles International Airport (LAX) and receive high praise from security officials.

EUMAS Conference 2011 Conference Paper

Game Theory for Security: An Important Challenge for Multiagent Systems

  • Bo An 0001
  • Milind Tambe

Abstract The goal of this paper is to introduce a real-world challenge problem for researchers in multiagent systems and beyond, where our collective efforts may have a significant impact on activities in the real-world. The challenge is in applying game theory for security: Our goal is not only to introduce the problem, but also to provide exemplars of initial successes of deployed systems in this challenge problem arena, some key open research challenges and pointers to getting started in this research.

AAMAS Conference 2011 Conference Paper

GUARDS - Game Theoretic Security Allocation on a National Scale

  • James Pita
  • Milind Tambe
  • Christopher Kiekintveld
  • Shane Cullen
  • Erin Steigerwald

Building on research previously reported at AAMAS conferences, this paper describes an innovative application of a novel gametheoretic approach for a national scale security deployment. Working with the United States Transportation Security Administration (TSA), we have developed a new application called GUARDS to assist in resource allocation tasks for airport protection at over 400 United States airports. In contrast with previous efforts such as ARMOR and IRIS, which focused on one-off tailored applications and one security activity (e. g. canine patrol or checkpoints) per application, GUARDS faces three key issues: (i) reasoning about hundreds of heterogeneous security activities; (ii) reasoning over diverse potential threats; (iii) developing a system designed for hundreds of end-users. Since a national deployment precludes tailoring to specific airports, our key ideas are: (i) creating a new game-theoretic framework that allows for heterogeneous defender activities and compact modeling of a large number of threats; (ii) developing an efficient solution technique based on general purpose Stackelberg game solvers; (iii) taking a partially centralized approach for knowledge acquisition and development of the system. In doing so we develop a software scheduling assistant, GUARDS, designed to reason over two agents - the TSA and a potential adversary - and allocate the TSA's limited resources across hundreds of security activities in order to provide protection within airports. The scheduling assistant has been delivered to the TSA and is currently under evaluation and testing for scheduling practices at an undisclosed airport. If successful, the TSA intends to incorporate the system into their unpredictable scheduling practices nationwide. In this paper we discuss the design choices and challenges encountered during the implementation of GUARDS. GUARDS represents promising potential for transitioning years of academic research into a nationally deployed system.

IJCAI Conference 2011 Conference Paper

GUARDS - Innovative Application of Game Theory for National Airport Security

  • James Pita
  • Milind Tambe
  • Christopher Kiekintveld
  • Shane Cullen
  • Erin Steigerwald

We describe an innovative application of a novel game-theoretic approach for a \textit{national scale} security deployment. Working with the United States Transportation Security Administration (TSA), we have developed a new application called GUARDS to allocate the TSA's limited resources across hundreds of security activities to provide protection at over 400 United States airports. Similar security applications (e. g. , ARMOR and IRIS) have focused on one-off tailored applications and one security activity (e. g. checkpoints) per application, GUARDS on the other hand faces three new key issues: (i) reasoning about hundreds of heterogeneous security activities; (ii) reasoning over diverse potential threats; (iii) developing a system designed for hundreds of end-users. Since a national deployment precludes tailoring to specific airports, our key ideas are: (i) creating a new game-theoretic framework that allows for heterogeneous defender activities and compact modeling of a large number of threats; (ii) developing an efficient solution technique based on general purpose Stackelberg game solvers; (iii) taking a partially centralized approach for knowledge acquisition. The scheduling assistant has been delivered to the TSA and is currently undergoing evaluation for scheduling practices at an undisclosed airport. If successful, the TSA intends to incorporate the system into their unpredictable scheduling practices nationwide.

AAMAS Conference 2011 Conference Paper

Improved Computational Models of Human Behavior in Security Games

  • Rong Yang
  • Christopher Kiekintveld
  • Fernando Ordonez
  • Milind Tambe
  • Richard John

It becomes critical to address human adversaries' bounded rationality in security games as the real-world deployment of such games spreads. To that end, the key contributions of this paper include: (i) new efficient algorithms for computing optimal strategic solutions using Prospect Theory and Quantal Response Equilibrium; (ii) the most comprehensive experiment to date studying the effectiveness of different models against human subjects for security games. Our new techniques outperform the leading contender for modelling human behavior in security games in experiment with human subjects.

IJCAI Conference 2011 Conference Paper

Improving Resource Allocation Strategy against Human Adversaries in Security Games

  • Rong Yang
  • Christopher Kiekintveld
  • Fernando Ordonez
  • Milind Tambe
  • Richard John

Recent real-world deployments of Stackelberg security games make it critical that we address human adversaries' bounded rationality in computing optimal strategies. To that end, this paper provides three key contributions: (i) new efficient algorithms for computing optimal strategic solutions using Prospect Theory and Quantal Response Equilibrium; (ii) the most comprehensive experiment to date studying the effectiveness of different models against human subjects for security games; and (iii) new techniques for generating representative payoff structures for behavioral experiments in generic classes of games. Our results with human subjects show that our new techniques outperform the leading contender for modeling human behavior in security games.

AAMAS Conference 2011 Conference Paper

Quality Guarantees for Region Optimal DCOP Algorithms

  • Meritxell Vinyals
  • Eric Shieh
  • Jesus Cerquides
  • Juan Antonio Rodriguez-Aguilar
  • Zhengyu Yin
  • Milind Tambe
  • Emma Bowring

k - and t -optimality algorithms provide solutions to DCOPs that are optimal in regions characterized by its size and distance respectively. Moreover, they provide quality guarantees on their solutions. Here we generalise the k - and t -optimal framework to introduce C -optimality, a flexible framework that provides reward-independent quality guarantees for optima in regions characterised by any arbitrary criterion. Therefore, C -optimality allows us to explore the space of criteria (beyond size and distance) looking for those that lead to better solution qualities. We benefit from this larger space of criteria to propose a new criterion, the socalled size-bounded-distance criterion, which outperforms k - and t -optimality.

AAMAS Conference 2011 Conference Paper

Quality-bounded Solutions for Finite Bayesian Stackelberg Games: Scaling up

  • Manish Jain
  • Christopher Kiekintveld
  • Milind Tambe

The fastest known algorithm for solving General Bayesian Stackelberg games with a finite set of follower (adversary) types have seen direct practical use at the LAX airport for over 3 years; and currently, an (albeit non-Bayesian) algorithm for solving these games is also being used for scheduling air marshals on limited sectors of international flights by the US Federal Air Marshals Service. These algorithms find optimal randomized security schedules to allocate limited security resources to protect targets. As we scale up to larger domains, including the full set of flights covered by the Federal Air Marshals, it is critical to develop newer algorithms that scale-up significantly beyond the limits of the current state-of-theart of Bayesian Stackelberg solvers. In this paper, we present a novel technique based on a hierarchical decomposition and branch and bound search over the follower type space, which may be applied to different Stackelberg game solvers. We have applied this technique to different solvers, resulting in: (i) A new exact algorithm called HBGS that is orders of magnitude faster than the best known previous Bayesian solver for general Stackelberg games; (ii) A new exact algorithm called HBSA which extends the fastest known previous security game solver towards the Bayesian case; and (iii) Approximation versions of HBGS and HBSA that show significant improvements over these newer algorithms with only 12% sacrifice in the practical solution quality.

AAAI Conference 2011 Conference Paper

Refinement of Strong Stackelberg Equilibria in Security Games

  • Bo An
  • Milind Tambe
  • Fernando Ordonez
  • Eric Shieh
  • Christopher Kiekintveld

Given the real-world deployments of attacker-defender Stackelberg security games, robustness to deviations from expected attacker behaviors has now emerged as a critically important issue. This paper provides four key contributions in this context. First, it identifies a fundamentally problematic aspect of current algorithms for security games. It shows that there are many situations where these algorithms face multiple equilibria, and they arbitrarily select one that may hand the defender a significant disadvantage, particularly if the attacker deviates from its equilibrium strategies due to unknown constraints. Second, for important subclasses of security games, it identifies situations where we will face such multiple equilibria. Third, to address these problematic situations, it presents two equilibrium refinement algorithms that can optimize the defender’s utility if the attacker deviates from equilibrium strategies. Finally, it experimentally illustrates that the refinement approach achieved significant robustness in consideration of attackers’ deviation due to unknown constraints.

AAAI Conference 2011 Conference Paper

Risk-Averse Strategies for Security Games with Execution and Observational Uncertainty

  • Zhengyu Yin
  • Manish Jain
  • Milind Tambe
  • Fernando Ordóñez

Attacker-defender Stackelberg games have become a popular game-theoretic approach for security with deployments for LAX Police, the FAMS and the TSA. Unfortunately, most of the existing solution approaches do not model two key uncertainties of the real-world: there may be noise in the defender’s execution of the suggested mixed strategy and/or the observations made by an attacker can be noisy. In this paper, we provide a framework to model these uncertainties, and demonstrate that previous strategies perform poorly in such uncertain settings. We also provide RECON, a novel algorithm that computes strategies for the defender that are robust to such uncertainties, and provide heuristics that further improve RE- CON’s efficiency.

AAMAS Conference 2011 Conference Paper

Teamwork in Distributed POMDPs: Execution-time Coordination Under Model Uncertainty

  • Jun-young Kwak
  • Rong Yang
  • Zhengyu Yin
  • Matthew E. Taylor
  • Milind Tambe

Despite their worst-case NEXP-complete planning complexity, DEC-POMDPs remain a popular framework for multiagent teamwork. This paper introduces effective teamwork under model uncertainty (i. e. , potentially inaccurate transition and observation functions) as a novel challenge for DEC-POMDPs and presents MODERN, the first execution-centric framework for DEC-POMDPs explicitly motivated by addressing such model uncertainty. MODERN's shift of coordination reasoning from planning-time to execution-time avoids the high cost of computing optimal plans whose promised quality may not be realized in practice. There are three key ideas in MODERN: (i) it maintains an exponentially smaller model of other agents' beliefs and actions than in previous work and then further reduces the computation-time and space expense of this model via bounded pruning; (ii) it reduces execution-time computation by exploiting BDI theories of teamwork, and limits communication to key trigger points; and (iii) it limits its decision-theoretic reasoning about communication to trigger points and uses a systematic markup to encourage extra communication at these points - thus reducing uncertainty among team members at trigger points.

AAMAS Conference 2010 Conference Paper

Asynchronous Algorithms for Approximate Distributed Constraint Optimization with Quality Bounds

  • Christopher Kiekintveld
  • Zhengyu Yin
  • Atul Kumar
  • Milind Tambe

Distributed Constraint Optimization (DCOP) is a popular framework for cooperative multi-agent decision making. DCOP is NP-hard, so an important line of work focuses on developing fast incomplete solution algorithms for large-scale applications. One ofthe few incomplete algorithms to provide bounds on solution quality is k-size optimality, which defines a local optimality criterionbased on the size of the group of deviating agents. Unfortunately, the lack of a general-purpose algorithm and the commitment toforming groups based solely on group size has limited the use ofk-size optimality. This paper introduces t-distance optimality which departs fromk-size optimality by using graph distance as an alternative criteriafor selecting groups of deviating agents. This throws open a new research direction into the tradeoffs between different group selectionand coordination mechanisms for incomplete DCOP algorithms. We derive theoretical quality bounds for t-distance optimality thatimprove known bounds for k-size optimality. In addition, we develop a new efficient asynchronous local search algorithm for finding both k-size and t-distance optimal solutions — allowing theseconcepts to be deployed in real applications. Indeed, empirical results show that this algorithm significantly outperforms the only existing algorithm for finding general k-size optimal solutions, whichis also synchronous. Finally, we compare the algorithmic performance of k-size and t-distance optimality using this algorithm. Wefind that t-distance consistently converges to higher-quality solutions in the long run, but results are mixed on convergence speed; we identify cases where k-size and t-distance converge faster.

AAMAS Conference 2010 Conference Paper

Robust Bayesian Methods for Stackelberg Security Games

  • Christopher Kiekintveld
  • Janusz Marecki
  • Milind Tambe

Recent work has applied game-theoretic models to real-world security problems at the Los Angeles International Airport (LAX)and Federal Air Marshals Service (FAMS). The analysis of thesedomains is based on input from domain experts intended to capture the best available intelligence information about potential terrorist activities and possible security countermeasures. Nevertheless, these models are subject to significant uncertainty - especiallyin security domains where intelligence about adversary capabilities and preferences is very difficult to gather. This uncertaintypresents significant challenges for applying game-theoretic analysis in these domains. Our experimental results show that standard solution methods based on perfect information assumptionsare very sensitive to payoff uncertainty, resulting in low payoffs forthe defender. We describe a model of Bayesian Stackelberg gamesthat allows for general distributional uncertainty over the attacker'spayoffs. We conduct an experimental analysis of two algorithms forapproximating equilibria of these games, and show that the resulting solutions give much better results than the standard approachwhen there is payoff uncertainty.

AAAI Conference 2010 Conference Paper

Security Games with Arbitrary Schedules: A Branch and Price Approach

  • Manish Jain
  • Erim Kardes
  • Christopher Kiekintveld
  • Fernando Ordonez
  • Milind Tambe

Security games, and important class of Stackelberg games, are used in deployed decision-support tools in use by LAX police and the Federal Air Marshals Service. The algorithms used to solve these games find optimal randomized schedules to allocate security resources for infrastructure protection. Unfortunately, the state of the art algorithms either fail to scale or to provide a correct solution for large problems with arbitrary scheduling constraints. We introduce ASPEN, a branch-and-price approach that overcomes these limitations based on two key contributions: (i) A columngeneration approach that exploits a novel network flow representation, avoiding a combinatorial explosion of schedule allocations; (ii) A branch-and-bound algorithm that generates bounds via a fast algorithm for solving security games with relaxed scheduling constraints. ASPEN is the first known method for efficiently solving massive security games with arbitrary schedules.

AAMAS Conference 2010 Conference Paper

Stackelberg vs. Nash in Security Games: Interchangeability, Equivalence, and Uniqueness

  • Zhengyu Yin
  • Dmytro Korzhyk
  • Christopher Kiekintveld
  • Vincent Conitzer
  • Milind Tambe

There has been significant recent interest in game theoretic approaches to security, with much of the recent research focused onutilizing the leader-follower Stackelberg game model; for example, these games are at the heart of major applications such asthe ARMOR program deployed for security at the LAX airportsince 2007 and the IRIS program in use by the US Federal AirMarshals (FAMS). The foundational assumption for using Stackelberg games is that security forces (leaders), acting first, commit toa randomized strategy; while their adversaries (followers) choosetheir best response after surveillance of this randomized strategy. Yet, in many situations, the followers may act without observation of the leader's strategy, essentially converting the game intoa simultaneous-move game model. Previous work fails to addresshow a leader should compute her strategy given this fundamentaluncertainty about the type of game faced. Focusing on the complex games that are directly inspired by realworld security applications, the paper provides four contributionsin the context of a general class of security games. First, exploiting the structure of these security games, the paper shows that theNash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem. Second, resolving theleader's dilemma, it shows that under a natural restriction on security games, any Stackelberg strategy is also a Nash equilibriumstrategy; and furthermore, the solution is unique in a class of realworld security games of which ARMOR is a key exemplar. Third, when faced with a follower that can attack multiple targets, manyof these properties no longer hold. Fourth, our experimental resultsemphasize positive properties of games that do not fit our restrictions. Our contributions have major implications for the real-worldapplications.

AAAI Conference 2010 Conference Paper

Urban Security: Game-Theoretic Resource Allocation in Networked Domains

  • Jason Tsai
  • Zhengyu Yin
  • Jun-young Kwak
  • David Kempe
  • Christopher Kiekintveld
  • Milind Tambe

Law enforcement agencies frequently must allocate limited resources to protect targets embedded in a network, such as important buildings in a city road network. Since intelligent attackers may observe and exploit patterns in the allocation, it is crucial that the allocations be randomized. We cast this problem as an attacker-defender Stackelberg game: the defender’s goal is to obtain an optimal mixed strategy for allocating resources. The defender’s strategy space is exponential in the number of resources, and the attacker’s exponential in the network size. Existing algorithms are therefore useless for all but the smallest networks. We present a solution approach based on two key ideas: (i) A polynomial-sized game model obtained via an approximation of the strategy space, solved efficiently using a linear program; (ii) Two efficient techniques that map solutions from the approximate game to the original, with proofs of correctness under certain assumptions. We present in-depth experimental results, including an evaluation on part of the Mumbai road network.

AAMAS Conference 2010 Conference Paper

When Should There be a "Me" in "Team"? Distributed Multi-Agent Optimization Under Uncertainty

  • Matthew Taylor
  • Manish Jain
  • Yanqin Jin
  • Makoto Yokoo
  • Milind Tambe

Increasing teamwork between agents typically increases the performance of a multi-agent system, at the cost of increased communication and higher computational complexity. This work examines joint actions in the context of a multi-agent optimizationproblem where agents must cooperate to balance exploration andexploitation. Surprisingly, results show that increased teamworkcan hurt agent performance, even when communication and computation costs are ignored, which we term the team uncertaintypenalty. This paper introduces the above phenomena, analyzes it, and presents algorithms to reduce the effect of the penalty in ourproblem setting.

IJCAI Conference 2009 Conference Paper

  • Manish Jain
  • Matthew Taylor
  • Milind Tambe
  • Makoto Yokoo

Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.

AAMAS Conference 2009 Conference Paper

Computing Optimal Randomized Resource Allocations for Massive Security Games

  • Christopher Kiekintveld
  • Manish Jain
  • Jason Tsai
  • James Pita
  • Fernando Ordóñez
  • Milind Tambe

Predictable allocations of security resources such as police officers, canine units, or checkpoints are vulnerable to exploitation by attackers. Recent work has applied game-theoretic methods to find optimal randomized security policies, including a fielded application at the Los Angeles International Airport (LAX). This approach has promising applications in many similar domains, including police patrolling for subway and bus systems, randomized baggage screening, and scheduling for the Federal Air Marshal Service (FAMS) on commercial flights. However, the existing methods scale poorly when the security policy requires coordination of many resources, which is central to many of these potential applications. We develop new models and algorithms that scale to much more complex instances of security games. The key idea is to use a compact model of security games, which allows exponential improvements in both memory and runtime relative to the best known algorithms for solving general Stackelberg games. We develop even faster algorithms for security games under payoff restrictions that are natural in many security domains. Finally, introduce additional realistic scheduling constraints while retaining comparable performance improvements. The empirical evaluation comprises both random data and realistic instances of the FAMS and LAX problems. Our new methods scale to problems several orders of magnitude larger than the fastest known algorithm.

AAMAS Conference 2009 Conference Paper

Effective Solutions for Real-World Stackelberg Games: When Agents Must Deal with Human Uncertainties

  • James Pita
  • Manish Jain
  • Fernando Ordóñez
  • Milind Tambe
  • Sarit Kraus
  • Reuma Magori-Cohen

How do we build multiagent algorithms for agent interactions with human adversaries? Stackelberg games are natural models for many important applications that involve human interaction, such as oligopolistic markets and security domains. In Stackelberg games, one player, the leader, commits to a strategy and the follower makes their decision with knowledge of the leader’s commitment. Existing algorithms for Stackelberg games efficiently find optimal solutions (leader strategy), but they critically assume that the follower plays optimally. Unfortunately, in real-world applications, agents face human followers (adversaries) who — because of their bounded rationality and limited observation of the leader strategy — may deviate from their expected optimal response. Not taking into account these likely deviations when dealing with human adversaries can cause an unacceptable degradation in the leader’s reward, particularly in security applications where these algorithms have seen real-world deployment. To address this crucial problem, this paper introduces three new mixed-integer linear programs (MILPs) for Stackelberg games to consider human adversaries, incorporating: (i) novel anchoring theories on human perception of probability distributions and (ii) robustness approaches for MILPs to address human imprecision. Since these new approaches consider human adversaries, traditional proofs of correctness or optimality are insufficient; instead, it is necessary to rely on empirical validation. To that end, this paper considers two settings based on real deployed security systems, and compares 6 different approaches (three new with three previous approaches), in 4 different observability conditions, involving 98 human subjects playing 1360 games in total. The final conclusion was that a model which incorporates both the ideas of robustness and anchoring achieves statistically significant better rewards and also maintains equivalent or faster solution speeds compared to existing approaches. General Terms Algorithms, Experimentation, Security, Human Factors

ICAPS Conference 2009 Conference Paper

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping

  • Pradeep Varakantham
  • Jun-young Kwak
  • Matthew E. Taylor
  • Janusz Marecki
  • Paul Scerri
  • Milind Tambe

Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

AAMAS Conference 2009 Conference Paper

Planning with Continuous Resources for Agent Teams

  • Janusz Marecki
  • Milind Tambe

Many problems of multiagent planning under uncertainty require distributed reasoning with continuous resources and resource limits. Decentralized Markov Decision Problems (Dec-MDPs) are well-suited to address such problems, but unfortunately, prior Dec- MDP approaches either discretize resources at the expense of speed and quality guarantees, or avoid discretization only by limiting agents’ action choices or interactions (e. g. assumption of transition independence). To address these shortcomings, this paper proposes M- DPFP, a novel algorithm for planning with continuous resources for agent teams, with three key features: (i) it maintains the agent team interaction graph to identify and prune the suboptimal policies and to allow the agents to be transition dependent, (ii) it operates in a continuous space of probability functions to provide the error bound on the solution quality and finally (iii) it focuses the search for policies on the most relevant parts of this search space to allow for a systematic trade-off of solution quality for speed. Our experiments show that M-DPFP finds high quality solutions and exhibits superior performance when compared with a discretization-based approach. We also show that M-DPFP is applicable to solving problems that are beyond the scope of existing approaches.

AAMAS Conference 2009 Conference Paper

Sensitivity Analysis for Distributed Optimization with Resource Constraints

  • Emma Bowring
  • Zhengyu Yin
  • Rob Zinkov
  • Milind Tambe

Previous work in multiagent coordination has addressed the challenge of planning in domains where agents must optimize a global goal, while satisfying local resource constraints. However, the imposition of resource constraints naturally raises the question of whether the agents could significantly improve their team performance if a few more resources were made available. Sensitivity analysis aims to answer that question. This paper focuses on sensitivity analysis in the context of the distributed coordination framework, Multiply-Constrained DCOP (MC-DCOP). There are three main challenges in performing sensitivity analysis: (i) to perform it in a distributed fashion, (ii) to avoid re-solving an NP-hard MC-DCOP optimization from scratch, and (iii) to avoid considering unproductive uses for extra resources. To meet these challenges, this paper presents three types of locally optimal algorithms: link analysis, local reoptimization and local constraint propagation. These algorithms are distributed and avoid redundant computation by ascertaining just the effects of local perturbations on the original problem. Deploying our algorithms on a large number of MC-DCOP problems revealed several results. While our cheapest algorithm successfully identified quality improvements for a few problems, our more complex techniques were necessary to identify the best uses for additional resources. Furthermore, we identified two heuristics that can help identify a priori which agents might benefit most from additional resources: density rank, which works well when nodes received identical resources and remaining resource rank, which works well when nodes received resources based on the number of neighbors they had.

AAAI Conference 2008 System Paper

ARMOR Security for Los Angeles International Airport

  • James Pita
  • Fernando Ordénez
  • Milind Tambe
  • Praveen Paruchuri

Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e. g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this demonstration showcases a promising transition of the latest in multi-agent algorithms into a deployed application. In particular, it exhibits a software assistant agent called AR- MOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines two key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals.

AAMAS Conference 2008 Conference Paper

Deployed ARMOR Protection: The Application of a Game Theoretic Model for Security at the Los Angeles International Airport

  • James Pita
  • Manish Jain
  • Janusz Marecki
  • Fernando Ord
  • oacute;
  • ntilde; ez
  • Christopher Portway
  • Milind Tambe

Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e. g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this paper describes a promising transition of the latest in multi-agent algorithms – in fact, an algorithm that represents a culmination of research presented at AAMAS – into a deployed application. In particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines three key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints; (iii) It alerts the users if mixed-initiative overrides appear to degrade the overall desired randomization. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. This paper examines the information, design choices, challenges, and evaluation that went into designing ARMOR.

AAMAS Conference 2008 Conference Paper

Not All Agents Are Equal: Scaling up Distributed POMDPs for Agent Networks

  • Janusz Marecki
  • Tapana Gupta
  • Pradeep Varakantham
  • Milind Tambe
  • Makoto Yokoo

Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are well-suited to address such applications, but so far, only limited scale-ups of up to five agents have been demonstrated. This paper escalates the scale-up, presenting an algorithm called FANS, increasing the number of agents in distributed POMDPs for the first time into double digits. FANS is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions: (i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics to automatically vary the FSM size in different agents for scaleup; (ii) FANS illustrates efficient integration of its FSM-based policy search within algorithms that exploit agent network structure; (iii) FANS provides significant speedups in policy evaluation and heuristic computations within the network algorithms by exploiting the FSMs for dynamic programming. Experimental results show not only orders of magnitude improvements over previous best known algorithms for smaller-scale domains (with similar solution quality), but also a scale-up into double digits in terms of numbers of agents.

AAMAS Conference 2008 Conference Paper

On K-Optimal Distributed Constraint Optimization Algorithms: New Bounds and Algorithms

  • Emma Bowring
  • Jonathan Pearce
  • Christopher Portway
  • Manish Jain
  • Milind Tambe

Distributed constraint optimization (DCOP) is a promising approach to coordination, scheduling and task allocation in multi agent networks. In large-scale or low-bandwidth networks, finding the global optimum is often impractical. K-optimality is a promising new approach: for the first time it provides us a set of locally optimal algorithms with quality guarantees as a fraction of global optimum. Unfortunately, previous work in k-optimality did not address domains where we may have prior knowledge of reward structure; and it failed to provide quality guarantees or algorithms for domains with hard constraints (such as agents’ local resource constraints). This paper addresses these shortcomings with three key contributions. It provides: (i) improved lower-bounds on k-optima quality incorporating available prior knowledge of reward structure; (ii) lower bounds on k-optima quality for problems with hard constraints; and (iii) k-optimal algorithms for solving DCOPs with hard constraints and detailed experimental results on large-scale networks.

AAMAS Conference 2008 Conference Paper

Playing Games for Security: An Efficient Exact Algorithm for Solving Bayesian Stackelberg Games

  • Praveen Paruchuri
  • Jonathan Pearce
  • Janusz Marecki
  • Milind Tambe
  • Fernando Ordonez
  • Sarit Kraus

In a class of games known as Stackelberg games, one agent (the leader) must commit to a strategy that can be observed by the other agent (the follower or adversary) before the adversary chooses its own strategy. We consider Bayesian Stackelberg games, in which the leader is uncertain about the types of adversary it may face. Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling certain areas, and a robber (follower) has a chance to observe this strategy over time before choosing its own strategy of where to attack. This paper presents an efficient exact algorithm for finding the optimal strategy for the leader to commit to in these games. This algorithm, DOBSS, is based on a novel and compact mixed-integer linear programming formulation. Compared to the most efficient algorithm known previously for this problem, DOBSS is not only faster, but also leads to higher quality solutions, and does not suffer from problems of infeasibility that were faced by this previous algorithm. Note that DOBSS is at the heart of the ARMOR system that is currently being tested for security scheduling at the Los Angeles International Airport.

IJCAI Conference 2007 Conference Paper

  • Jonathan P. Pearce
  • Milind Tambe

A distributed constraint optimization problem (DCOP) is a formalism that captures the rewards and costs of local interactions within a team of agents. Because complete algorithms to solve DCOPs are unsuitable for some dynamic or anytime domains, researchers have explored incomplete DCOP algorithms that result in locally optimal solutions. One type of categorization of such algorithms, and the solutions they produce, is k-optimality; a k-optimal solution is one that cannot be improved by any deviation by k or fewer agents. This paper presents the first known guarantees on solution quality for k-optimal solutions. The guarantees are independent of the costs and rewards in the DCOP, and once computed can be used for any DCOP of a given constraint graph structure.

IJCAI Conference 2007 Conference Paper

  • Janusz Marecki
  • Sven Koenig
  • Milind Tambe

Agents often have to construct plans that obey deadlines or, more generally, resource limits for real-valued resources whose consumption can only be characterized by probability distributions, such as execution time or battery power. These planning problems can be modeled with continuous state Markov decision processes (MDPs) but existing solution methods are either inefficient or provide no guarantee on the quality of the resulting policy. We therefore present CPH, a novel solution method that solves the planning problems by first approximating with any desired accuracy the probability distributions over the resource consumptions with phase-type distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs by exploiting properties of exponential distributions to calculate the necessary convolutions accurately and efficiently while providing strong guarantees on the quality of the resulting policy. Our experimental feasibility study in a Mars rover domain demonstrates a substantial speedup over Lazy Approximation, which is currently the leading algorithm for solving continuous state MDPs with quality guarantees.

IJCAI Conference 2007 Conference Paper

  • Pradeep Varakantham
  • Rajiv Maheswaran
  • Tapana Gupta
  • Milind Tambe

While POMDPs (partially observable markov decision problems) are a popular computational model with wide-ranging applications, the computational cost for optimal policy generation is prohibitive. Researchers are investigating ever-more efficient algorithms, yet many applications demand such algorithms bound any loss in policy quality when chasing efficiency. To address this challenge, we present two new techniques. The first approximates in the value space to obtain solutions efficiently for a pre-specified error bound. Unlike existing techniques, our technique guarantees the resulting policy will meet this bound. Furthermore, it does not require costly computations to determine the quality loss of the policy. Our second technique prunes large tracts of belief space that are unreachable, allowing faster policy computation without any sacrifice in optimality. The combination of the two techniques, which are complementary to existing optimal policy generation algorithms, provides solutions with tight error bounds efficiently in domains where competing algorithms fail to provide such tight bounds.

AAMAS Conference 2007 Conference Paper

An Efficient Heuristic Approach for Security Against Multiple Adversaries

  • Praveen Paruchuri
  • Jonathan P. Pearce
  • Milind Tambe
  • Fernando Ordonez
  • Sarit Kraus

In adversarial multiagent domains, security, commonly defined as the ability to deal with intentional threats from other agents, is a critical issue. This paper focuses on domains where these threats come from unknown adversaries. These domains can be modeled as Bayesian games; much work has been done on finding equilibria for such games. However, it is often the case in multiagent security domains that one agent can commit to a mixed strategy which its adversaries observe before choosing their own strategies. In this case, the agent can maximize reward by finding an optimal strategy, without requiring equilibrium. Previous work has shown this problem of optimal strategy selection to be NP-hard. Therefore, we present a heuristic called ASAP, with three key advantages to address the problem. First, ASAP searches for the highest-reward strategy, rather than a Bayes-Nash equilibrium, allowing it to find feasible strategies that exploit the natural first-mover advantage of the game. Second, it provides strategies which are simple to understand, represent, and implement. Third, it operates directly on the compact, Bayesian game representation, without requiring conversion to normal form. We provide an efficient Mixed Integer Linear Program (MILP) implementation for ASAP, along with experimental results illustrating significant speedups and higher rewards over other approaches.

AAMAS Conference 2007 Conference Paper

Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies

  • Pradeep Varakantham
  • Janusz Marecki
  • Yuichi Yabu
  • Milind Tambe
  • Makoto Yokoo

Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multi-agent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach has focused on approximate solutions. Though this approach is efficient, the algorithms within this approach do not provide any guarantees on solution quality. A second less popular approach focuses on global optimality, but typical results are available only for two agents, and also at considerable computational cost. This paper overcomes the limitations of both these approaches by providing SPIDER, a novel combination of three key features for policy generation in distributed POMDPs: (i) it exploits agent interaction structure given a network of agents (i. e. allowing easier scale-up to larger number of agents); (ii) it uses a combination of heuristics to speedup policy search; and (iii) it allows quality guaranteed approximations, allowing a systematic tradeoff of solution quality for time. Experimental results show orders of magnitude improvement in performance when compared with previous global optimal algorithms.

AAMAS Conference 2007 Conference Paper

On Opportunistic Techniques for Solving Decentralized Markov Decision Processes with Temporal Constraints

  • Janusz Marecki
  • Milind Tambe

Decentralized Markov Decision Processes (DEC-MDPs) are a popular model of agent-coordination problems in domains with uncertainty and time constraints but very difficult to solve. In this paper, we improve a state-of-the-art heuristic solution method for DEC-MDPs, called OC-DEC-MDP, that has recently been shown to scale up to larger DEC-MDPs. Our heuristic solution method, called Value Function Propagation (VFP), combines two orthogonal improvements of OC-DEC-MDP. First, it speeds up OC-DECMDP by an order of magnitude by maintaining and manipulating a value function for each state (as a function of time) rather than a separate value for each pair of sate and time interval. Furthermore, it achieves better solution qualities than OC-DEC-MDP because, as our analytical results show, it does not overestimate the expected total reward like OC-DEC- MDP. We test both improvements independently in a crisis-management domain as well as for other types of domains. Our experimental results demonstrate a significant speedup of VFP over OC-DEC-MDP as well as higher solution qualities in a variety of situations.

JAAMAS Journal 2006 Journal Article

Privacy Loss in Distributed Constraint Reasoning: A Quantitative Framework for Analysis and its Applications

  • Rajiv T. Maheswaran
  • Jonathan P. Pearce
  • Milind Tambe

Abstract It is critical that agents deployed in real-world settings, such as businesses, offices, universities and research laboratories, protect their individual users’ privacy when interacting with other entities. Indeed, privacy is recognized as a key motivating factor in the design of several multiagent algorithms, such as in distributed constraint reasoning (including both algorithms for distributed constraint optimization (DCOP) and distributed constraint satisfaction (DisCSPs)), and researchers have begun to propose metrics for analysis of privacy loss in such multiagent algorithms. Unfortunately, a general quantitative framework to compare these existing metrics for privacy loss or to identify dimensions along which to construct new metrics is currently lacking. This paper presents three key contributions to address this shortcoming. First, the paper presents VPS (Valuations of Possible States), a general quantitative framework to express, analyze and compare existing metrics of privacy loss. Based on a state-space model, VPS is shown to capture various existing measures of privacy created for specific domains of DisCSPs. The utility of VPS is further illustrated through analysis of privacy loss in DCOP algorithms, when such algorithms are used by personal assistant agents to schedule meetings among users. In addition, VPS helps identify dimensions along which to classify and construct new privacy metrics and it also supports their quantitative comparison. Second, the article presents key inference rules that may be used in analysis of privacy loss in DCOP algorithms under different assumptions. Third, detailed experiments based on the VPS-driven analysis lead to the following key results: (i) decentralization by itself does not provide superior protection of privacy in DisCSP/DCOP algorithms when compared with centralization; instead, privacy protection also requires the presence of uncertainty about agents’ knowledge of the constraint graph. (ii) one needs to carefully examine the metrics chosen to measure privacy loss; the qualitative properties of privacy loss and hence the conclusions that can be drawn about an algorithm can vary widely based on the metric chosen. This paper should thus serve as a call to arms for further privacy research, particularly within the DisCSP/DCOP arena.

IJCAI Conference 2005 Conference Paper

Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs

  • Ranjit Nair
  • Pradeep Varakantham
  • Milind Tambe
  • Makoto

In many real-world multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the realworld uncertainty in multiagent domains, they fail to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction but fails to capture planning under uncertainty. This paper present a new model synthesized from distributed POMDPs and DCOPs, called Networked Distributed POMDPs (ND-POMDPs). Exploiting network structure enables us to present a distributed policy generation algorithm that performs local search.

AAAI Conference 2005 Conference Paper

Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs

  • Ranjit Nair
  • Milind Tambe

In many real-world multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the real-world uncertainty in multiagent domains, they fail to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction but fails to capture planning under uncertainty. This paper present a new model synthesized from distributed POMDPs and DCOPs, called Networked Distributed POMDPs (ND-POMDPs). Exploiting network structure enables us to present two novel algorithms for ND-POMDPs: a distributed policy generation algorithm that performs local search and a systematic policy search that is guaranteed to reach the global optimal.

AAAI Conference 1999 Conference Paper

Automated Team Analysis

  • Taylor Raines
  • Milind Tambe
  • Stacy Marsella
  • University of Southern California

We have created an agent for analyzing and improving synthetic teams. The agent is built in a bottom-up fashion using little specific domain knowledge. In lieu of extensive domain knowledge, data mining and inductive learning techniques are used in an attempt to isolate the key issues determining the successes or failures of these teams. This approach has been applied to the RoboCup domain, with a current focus on analyzing shots on goal and with future plans for assists, passing, and general teamwork.

IJCAI Conference 1999 Conference Paper

Two Fielded Teams and Two Experts: A RoboCup Challenge Response from the Trenches

  • Milind Tambe
  • Gal A. Kaminka
  • Stacy Marsella
  • Ion Muslea
  • Taylor Raines

The RoboCup (robot world-cup soccer) effort, initiated to stimulate research in multi-agents and robotics, has blossomed into a significant effort of international proportions. RoboCup is simultaneously a fundamental research effort and a set of competitions for testing research ideas. At IJ- CAI'97, a broad research challenge was issued for the RoboCup synthetic agents, covering areas of multi-agent learning, teamwork and agent modeling. This paper outlines our attack on the entire breadth of the RoboCup research challenge, on all of its categories, in the form of two fielded, contrasting RoboCup teams, and two off-line soccer analysis agents. We compare the teams and the agents to generalize the lessons learned in learning, teamwork and agent modeling.

AAAI Conference 1997 Conference Paper

Agent Architectures for Flexible, Practical Teamwork

  • Milind Tambe

Teamwork in complex, dynamic, multi-agent domains mandates highly flexible coordination and communication. Simply fitting individual agents with precomputed coordination plans will not do, for their inflexibility can cause severe failures in teamwork, and their domain-specificity hinders reusability. Our central hypothesis is that the key to such flexibility and reusability is agent architectures with integrated teamwork capabilities. This fundamental shift in agent architectures is illustrated via an implemented candidate: STEAM. While STEAM is founded on the j&t intentions theory, practical operationalization has required it to integrate several key novel concepts: (i) team synchronization to establish joint intentions; (ii) constructs for monitoring joint intentions and repair; and (iii) decision-theoretic communication selectivity (to pragmatically extend the joint intentions theory). Applications in three different complex domains, with empirical results, are presented.

IJCAI Conference 1997 Conference Paper

The RoboCup Synthetic Agent Challenge

  • Hiroaki Kitano
  • Milind Tambe
  • Peter Stone
  • Manuela Veloso
  • Silvia Coradeschi
  • Eiichi Osawa
  • Hitoshi Matsubara
  • ltsuki Noda

RoboCup Challenge offers a set of challenges for intelligent agent researchers using a friendly competition in a dynamic, real-time, multiagent domain. While RoboCup in general envisions longer range challenges over the next few decades, RoboCup Challenge presents three specific challenges for the next two years: (i) learning of individual agents and teams; (ii) multi-agent team planning and plan-execution in service of teamwork; and (iii) opponent modeling. RoboCup Challenge provides a novel opportunity for machine learning, planning, and multi-agent researchers it not only supplies a concrete domain to evalute their techniques, but also challenges researchers to evolve these techniques to face key constraints fundamental to this domain: real-time, uncertainty, and teamwork.

AAAI Conference 1996 Conference Paper

Tracking Dynamic Team Activity

  • Milind Tambe

AI researchers are striving to build complex multi-agent worlds with intended applications ranging from the RoboCup robotic soccer tournaments, to interactive virtual theatre, to large-scale real-world battlefield simulations. Agent tracking - monitoring other agent’ s actions and inferring their higher-level goals and intentions - is a central requirement in such worlds. While previous work has mostly focused on tracking individual agents, this paper goes beyond by focusing on agent teams. Team tracking poses the challenge of tracking a team’ s joint goals and plans. Dynamic, real-time environments add to the challenge, as ambiguities have to be resolved in real-time. The central hypothesis underlying the present work is that an explicit team-oriented perspective enables effective team tracking. This hypothesis is instantiated using the model tracing technology employed in tracking individual agents. Thus, to track team activities, team models are put to service. Team models are a concrete application of the joint intentions framework and enable an agent to track team activities, regardless of the agent’ s being a collaborative participant or a non-participant in the team. To facilitate real-time ambiguity resolution with team models: (i) aspects of tracking are cast as constraint satisfaction problems to exploit constraint propagation techniques; and (ii) a cost minimality criterion is applied to constrain tracking search. Empirical results from two separate tasks in real-world, dynamic environments one collaborative and one competitive - are provided.

IJCAI Conference 1995 Conference Paper

RESC: An Approach for Real-time, Dynamic Agent Tracking

  • Milind Tambe
  • Paul S. Rosenbloom

Agent tracking involves monitoring the observ­ able actions of other agents as well as infer­ ring their unobserved actions, plans, goals and behaviors. In a dynamic, real-time environ­ ment, an intelligent agent faces the challenge of tracking other agents' flexible mix of goaldriven and reactive behaviors, and doing so in real-time, despite ambiguities. This paper presents RESC (REal-time Situated Commit­ ments), an approach that enables an intelligent agent to meet this challenge. RESC's situatedness derives from its constant uninterrupted attention to the current world situation — it always tracks other agents' on-going actions in the context of this situation. Despite ambigu­ ities, RESC quickly commits to a single inter­ pretation of the on-going actions (without an extensive examination of the alternatives), and uses that in service of interpretation of future actions. However, should its commitments lead to inconsistencies in tracking, it uses singlestate backtracking to undo some of the commit­ ments and repair the inconsistencies. Together, RESC's situatedness, immediate commitment, and single-state backtracking conspire in pro­ viding RESC its real-time character. RESC is implemented in the context of intelli­ gent pilot agents participating in a real-world synthetic air-combat environment. Experimen­ tal results illustrating RESC's effectiveness are presented. 1

TIME Conference 1994 Conference Paper

Event Tracking for an Intelligent Automated Agent

  • Milind Tambe
  • Paul S. Rosenbloom

In a dynamic, multiagent environment, an automated intelligent agent is often faced with the possibility that other agents may instigate events that hinder or help the achievement of its own goals. To act intelligently in such an environment, an automated agent needs an event tracking capability to continually monitor the occurrence of such events and the temporal relationships among them. This capability enables an agent to infer the occurrence of important unobserved events as well as to obtain a better understanding of the interaction among events. This article focuses on event tracking in one complex and dynamic multiagent environment: the air‐combat simulation environment. It analyzes the challenges that an automated pilot agent must face when tracking events in this environment. This analysis reveals three new issues that have not been addressed in previous work in this area: (i) tracking events generated by agents’ flexible and reactive behaviors, (ii) tracking events in the context of continuous agent interactions, and (iii) tracking events in real time. This article proposes one solution to address these issues. One key idea in this solution is that the (architectural) mechanisms that an agent employs in generating its own flexible and reactive behaviors can be used to track other agents’ flexible and reactive behaviors in real time. A second key idea is the use of a world‐centered representation for modeling agent interactions. The solution is demonstrated using an implementation of an automated pilot agent.

AAAI Conference 1993 Conference Paper

On the Masking Effect

  • Milind Tambe

Machine learnin g approaches to knowledge compilation seek to improve the performance of problem-solvers by storing solutions to previously solved problems in an efficient, generalized form. The problem-solver retrieves these learned solutions in appropriate later situations to obtain results more efficiently. However, by relying on its learned knowledge to provide a solution, the problem-solver may miss an alternative solution of higher quality - one that could have been generated using the original (non-learned) problem-solving knowledge. This phenomenon is referred to as the masking effect of learning. In this paper, we examine a sequence of possible solutions for the masking effect. Each solution refines and builds on the previous one. The final solution is based on cascaded filters. When learned knowledge is retrieved, these filters alert the system about the inappropriateness of this knowledge so that the system can then derive a better alternative solution. We analyze conditions under which this solution will perform better than the others, and present experimental data supportive of the analysis. This investigation is based on a simulated robot domain called Groundworld. ’