Author name cluster

Janusz Marecki

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

2 author rows

AAMAS Conference 2017 Conference Paper

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Joel Z. Leibo
Vinicius Zambaldi
Marc Lanctot
Janusz Marecki
Thore Graepel

Matrix games like Prisoner’s Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Qnetwork, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation. CCS Concepts •Computing methodologies → Multi-agent reinforcement learning; Agent / discrete models; Stochastic games;

UAI Conference 2013 Conference Paper

Solution Methods for Constrained Markov Decision Process with Continuous Probability Modulation

Marek Petrik
Dharmashankar Subramanian
Janusz Marecki

We propose solution methods for previouslyunsolved constrained MDPs in which actions can continuously modify the transition probabilities within some acceptable sets. While many methods have been proposed to solve regular MDPs with large state sets, there are few practical approaches for solving constrained MDPs with large action sets. In particular, we show that the continuous action sets can be replaced by their extreme points when the rewards are linear in the modulation. We also develop a tractable optimization formulation for concave reward functions and, surprisingly, also extend it to nonconcave reward functions by using their concave envelopes. We evaluate the effectiveness of the approach on the problem of managing delinquencies in a portfolio of loans.

AAMAS Conference 2012 Conference Paper

Delayed Observation Planning in Partially Observable Domains

Pradeep Varakantham
Janusz Marecki

Traditional models for planning under uncertainty such as Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs) assume that the observations about the results of agent actions are instantly available to the agent. In so doing, they are no longer applicable to domains where observations are received with delays caused by temporary unavailability of information (e. g. delayed response of the market to a new product). To that end, we make the following key contributions towards solving Delayed observation POMDPs (D-POMDPs): (i) We first provide an parameterized approximate algorithm for solving D-POMDPs efficiently, with desired accuracy; and (ii) We then propose a policy execution technique that adjusts the policy at runtime to account for the actual realization of observations. We then show the performance of our techniques on POMDP benchmark problems with delayed observations where explicit modeling of delayed observations leads to solutions of superior quality.

AAMAS Conference 2012 Conference Paper

Playing Repeated Stackelberg Games with Unknown Opponents

Janusz Marecki
Gerry Tesauro
Richard Segal

In Stackelberg games, a "leader" player first chooses a mixed strategy to commit to, then a "follower" player responds based on the observed leader strategy. Notable strides have been made in scaling up the algorithms for such games, but the problem of finding optimal leader strategies spanning multiple rounds of the game, with a Bayesian prior over unknown follower preferences, has been left unaddressed. Towards remedying this shortcoming we propose a first-of-a-kind tractable method to compute an optimal plan of leader actions in a repeated game against an unknown follower, assuming that the follower plays myopic best-response in every round. Our approach combines Monte Carlo Tree Search, dealing with leader exploration/exploitation tradeoffs, with a novel technique for the identification and pruning of dominated leader strategies. The method provably finds asymptotically optimal solutions and scales up to real world security games spanning double-digit number of rounds.

AAMAS Conference 2011 Conference Paper

Approximation Methods for Infinite Bayesian Stackelberg Games: Modeling Distributional Payoff Uncertainty

Christopher Kiekintveld
Janusz Marecki
Milind Tambe

Game theory is fast becoming a vital tool for reasoning about complex real-world security problems, including critical infrastructure protection. The game models for these applications are constructed using expert analysis and historical data to estimate the values of key parameters, including the preferences and capabilities of terrorists. In many cases, it would be natural to represent uncertainty over these parameters using continuous distributions (such as uniform intervals or Gaussians). However, existing solution algorithms are limited to considering a small, finite number of possible attacker types with different payoffs. We introduce a general model of infinite Bayesian Stackelberg security games that allows payoffs to be represented using continuous payoff distributions. We then develop several techniques for finding approximate solutions for this class of games, and show empirically that our methods offer dramatic improvements over the current state of the art, providing new ways to improve the robustness of security game models.

UAI Conference 2010 Conference Paper

ALARMS: Alerting and Reasoning Management System for Next Generation Aircraft Hazards

Alan Carlin
Nathan Schurr
Janusz Marecki

The Next Generation Air Transportation System will introduce new, advanced sensor technologies into the cockpit. With the introduction of such systems, the responsibilities of the pilot are expected to dramatically increase. In the ALARMS (ALerting And Reasoning Management System) project for NASA, we focus on a key challenge of this environment, the quick and efficient handling of aircraft sensor alerts. It is infeasible to alert the pilot on the state of all subsystems at all times. Furthermore, there is uncertainty as to the true hazard state despite the evidence of the alerts, and there is uncertainty as to the effect and duration of actions taken to address these alerts. This paper reports on the first steps in the construction of an application designed to handle Next Generation alerts. In ALARMS, we have identified 60 different aircraft subsystems and 20 different underlying hazards. In this paper, we show how a Bayesian network can be used to derive the state of the underlying hazards, based on the sensor input. Then, we propose a framework whereby an automated system can plan to address these hazards in cooperation with the pilot, using a Time-Dependent Markov Process (TMDP). Different hazards and pilot states will call for different alerting automation plans. We demonstrate this emerging application of Bayesian networks and TMDPs to cockpit automation, for a use case where a small number of hazards are present, and analyze the resulting alerting automation policies.

AAMAS Conference 2010 Conference Paper

Function Allocation for NextGen Airspace via Agents

Nathan Schurr
Paul Picciano
Janusz Marecki

Commercial aviation transportation is on the rise and hasbecome a necessity in our increasingly global world. Thereis a societal demand for more options, more traffic, moreefficiency, while still maintaining safety in the airspace. Tomeet these demands the Next Generation Air Transportation System (NextGen) concept from NASA calls for technologies and systems offering increasing support from automated decision-aiding and optimization tools. Such systemsmust coordinate with the human operator to take advantage of the functions each can best perform: The automatedtools must be designed to support the optimal allocation oftasks (functions) between the system and the human operators using these systems. Preliminary function allocationmethods must be developed (and evaluated) that focus onthe NextGen Airportal challenges, given a flexible, changingConcept of Operations (ConOps). We have begun making steps toward this by leveragingwork in agents research (namely Adjustable Autonomy) inorder to allow function allocation to become more dynamicand adjust to the goals, demands, and constraints of thecurrent situation as it unfolds. In this paper we introduceDynamic Function Allocation Strategies (DFAS) that arenot static and singular, but rather are represented by allocation policies that vary over time and circumstances. TheNextGen aviation domain is a natural fit for agent basedsystems because of its inherently distributed nature and theneed for automated systems to coordinate on tasks mapswell to the adjustable autonomy problem. While current adjustable autonomy methods are applicable in this context, crucial extensions are needed to push the existing models tolarger numbers of human players, while maintaining criticaltiming. To this end, we have created an air traffic controlsystem that includes: (1) A simulation environment, (2) aDFAS algorithm for providing adjustable autonomy strategies and (3) the agents for executing the strategies and measuring system efficiency. We believe that our system is thefirst step towards showing the efficacy of agent supportedapproach to driving the dynamic roles across human operators and automated systems in the NextGen environment. We present some initial results from a pilot study using thissystem.

AAMAS Conference 2010 Conference Paper

Risk-Sensitive Planning in Partially Observable Environments

Janusz Marecki
Pradeep Varakantham

Partially Observable Markov Decision Process (POMDP) isa popular framework for planning under uncertainty in partially observable domains. Yet, the POMDP model is risk-neutral in that it assumes that the agent is maximizing theexpected reward of its actions. In contrast, in domains likefinancial planning, it is often required that the agent decisions are risk-sensitive (maximize the utility of agent actions, for non-linear utility functions). Unfortunately, existing POMDP solvers cannot solve such planning problems exactly. By considering piecewise linear approximations of utility functions, this paper addresses this shortcoming in three contributions: (i) It defines the Risk-SensitivePOMDP model; (ii) It derives the fundamental propertiesof the underlying value functions and provides a functionalvalue iteration technique to compute them exactly and (c) Itproposes an efficient procedure to determine the dominatedvalue functions, to speed up the algorithm. Our experimentsshow that the proposed approach is feasible and applicableto realistic financial planning domains.

AAMAS Conference 2010 Conference Paper

Robust Bayesian Methods for Stackelberg Security Games

Christopher Kiekintveld
Janusz Marecki
Milind Tambe

Recent work has applied game-theoretic models to real-world security problems at the Los Angeles International Airport (LAX)and Federal Air Marshals Service (FAMS). The analysis of thesedomains is based on input from domain experts intended to capture the best available intelligence information about potential terrorist activities and possible security countermeasures. Nevertheless, these models are subject to significant uncertainty - especiallyin security domains where intelligence about adversary capabilities and preferences is very difficult to gather. This uncertaintypresents significant challenges for applying game-theoretic analysis in these domains. Our experimental results show that standard solution methods based on perfect information assumptionsare very sensitive to payoff uncertainty, resulting in low payoffs forthe defender. We describe a model of Bayesian Stackelberg gamesthat allows for general distributional uncertainty over the attacker'spayoffs. We conduct an experimental analysis of two algorithms forapproximating equilibria of these games, and show that the resulting solutions give much better results than the standard approachwhen there is payoff uncertainty.

ICAPS Conference 2009 Conference Paper

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping

Pradeep Varakantham
Jun-young Kwak
Matthew E. Taylor
Janusz Marecki
Paul Scerri
Milind Tambe

Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

AAMAS Conference 2009 Conference Paper

Improving Adjustable Autonomy Strategies for Time-Critical Domains

Nathan Schurr
Janusz Marecki
Milind Tambe

AAMAS Conference 2009 Conference Paper

Planning with Continuous Resources for Agent Teams

Janusz Marecki
Milind Tambe

Many problems of multiagent planning under uncertainty require distributed reasoning with continuous resources and resource limits. Decentralized Markov Decision Problems (Dec-MDPs) are well-suited to address such problems, but unfortunately, prior Dec- MDP approaches either discretize resources at the expense of speed and quality guarantees, or avoid discretization only by limiting agents’ action choices or interactions (e. g. assumption of transition independence). To address these shortcomings, this paper proposes M- DPFP, a novel algorithm for planning with continuous resources for agent teams, with three key features: (i) it maintains the agent team interaction graph to identify and prune the suboptimal policies and to allow the agents to be transition dependent, (ii) it operates in a continuous space of probability functions to provide the error bound on the solution quality and ﬁnally (iii) it focuses the search for policies on the most relevant parts of this search space to allow for a systematic trade-off of solution quality for speed. Our experiments show that M-DPFP ﬁnds high quality solutions and exhibits superior performance when compared with a discretization-based approach. We also show that M-DPFP is applicable to solving problems that are beyond the scope of existing approaches.

AAMAS Conference 2008 Conference Paper

Deployed ARMOR Protection: The Application of a Game Theoretic Model for Security at the Los Angeles International Airport

James Pita
Manish Jain
Janusz Marecki
Fernando Ord
oacute;
ntilde; ez
Christopher Portway
Milind Tambe

Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e. g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this paper describes a promising transition of the latest in multi-agent algorithms – in fact, an algorithm that represents a culmination of research presented at AAMAS – into a deployed application. In particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines three key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints; (iii) It alerts the users if mixed-initiative overrides appear to degrade the overall desired randomization. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. This paper examines the information, design choices, challenges, and evaluation that went into designing ARMOR.

AAAI Conference 2008 Conference Paper

Efficient Algorithms to Solve Bayesian Stackelberg Games for Security Applications

Praveen Paruchuri
Janusz Marecki
Fernando Ordonez

In a class of games known as Stackelberg games, one agent (the leader) must commit to a strategy that can be observed by the other agent (the adversary/follower) before the adversary chooses its own strategy. We consider Bayesian Stackelberg games, in which the leader is uncertain about the type of the adversary it may face. Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling certain areas, and an adversary (follower) can observe this strategy over time before choosing where to attack. We present here two different MIP-formulations, ASAP (providing approximate policies with controlled randomization) and DOBSS (providing optimal policies) for Bayesian Stackelberg games. DOBSS is currently the fastest optimal procedure for Bayesian Stackelberg games and is in use by police at the Los Angeles International Airport(LAX) to schedule their activities.

AAMAS Conference 2008 Conference Paper

Not All Agents Are Equal: Scaling up Distributed POMDPs for Agent Networks

Janusz Marecki
Tapana Gupta
Pradeep Varakantham
Milind Tambe
Makoto Yokoo

Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are well-suited to address such applications, but so far, only limited scale-ups of up to five agents have been demonstrated. This paper escalates the scale-up, presenting an algorithm called FANS, increasing the number of agents in distributed POMDPs for the first time into double digits. FANS is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions: (i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics to automatically vary the FSM size in different agents for scaleup; (ii) FANS illustrates efficient integration of its FSM-based policy search within algorithms that exploit agent network structure; (iii) FANS provides significant speedups in policy evaluation and heuristic computations within the network algorithms by exploiting the FSMs for dynamic programming. Experimental results show not only orders of magnitude improvements over previous best known algorithms for smaller-scale domains (with similar solution quality), but also a scale-up into double digits in terms of numbers of agents.

AAMAS Conference 2008 Conference Paper

Playing Games for Security: An Efficient Exact Algorithm for Solving Bayesian Stackelberg Games

Praveen Paruchuri
Jonathan Pearce
Janusz Marecki
Milind Tambe
Fernando Ordonez
Sarit Kraus

In a class of games known as Stackelberg games, one agent (the leader) must commit to a strategy that can be observed by the other agent (the follower or adversary) before the adversary chooses its own strategy. We consider Bayesian Stackelberg games, in which the leader is uncertain about the types of adversary it may face. Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling certain areas, and a robber (follower) has a chance to observe this strategy over time before choosing its own strategy of where to attack. This paper presents an efficient exact algorithm for finding the optimal strategy for the leader to commit to in these games. This algorithm, DOBSS, is based on a novel and compact mixed-integer linear programming formulation. Compared to the most efficient algorithm known previously for this problem, DOBSS is not only faster, but also leads to higher quality solutions, and does not suffer from problems of infeasibility that were faced by this previous algorithm. Note that DOBSS is at the heart of the ARMOR system that is currently being tested for security scheduling at the Los Angeles International Airport.

AAMAS Conference 2008 Conference Paper

RIAACT: A robust approach to adjustable autonomy for human-multiagent teams

Nathan Schurr
Janusz Marecki
Milind Tambe
Chien-Ju Ho
Jane Yung-jen Hsu

AAAI Conference 2008 Conference Paper

Towards Faster Planning with Continuous Resources in Stochastic Domains

Janusz Marecki

Agents often have to construct plans that obey resource limits for continuous resources whose consumption can only be characterized by probability distributions. While Markov Decision Processes (MDPs) with a state space of continuous and discrete variables are popular for modeling these domains, current algorithms for such MDPs can exhibit poor performance with a scale-up in their state space. To remedy that we propose an algorithm called DPFP. DPFP’s key contribution is its exploitation of the dual space cumulative distribution functions. This dual formulation is key to DPFP’s novel combination of three features. First, it enables DPFP’s membership in a class of algorithms that perform forward search in a large (possibly inﬁnite) policy space. Second, it provides a new and efﬁcient approach for varying the policy generation effort based on the likelihood of reaching different regions of the MDP state space. Third, it yields a bound on the error produced by such approximations. These three features conspire to allow DPFP’s superior performance and systematic trade-off of optimality for speed. Our experimental evaluation shows that, when run stand-alone, DPFP outperforms other algorithms in terms of its any-time performance, whereas when run as a hybrid, it allows for a signiﬁcant speedup of a leading continuous resource MDP solver.

IJCAI Conference 2007 Conference Paper

Janusz Marecki
Sven Koenig
Milind Tambe

Agents often have to construct plans that obey deadlines or, more generally, resource limits for real-valued resources whose consumption can only be characterized by probability distributions, such as execution time or battery power. These planning problems can be modeled with continuous state Markov decision processes (MDPs) but existing solution methods are either inefficient or provide no guarantee on the quality of the resulting policy. We therefore present CPH, a novel solution method that solves the planning problems by first approximating with any desired accuracy the probability distributions over the resource consumptions with phase-type distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs by exploiting properties of exponential distributions to calculate the necessary convolutions accurately and efficiently while providing strong guarantees on the quality of the resulting policy. Our experimental feasibility study in a Mars rover domain demonstrates a substantial speedup over Lazy Approximation, which is currently the leading algorithm for solving continuous state MDPs with quality guarantees.

AAMAS Conference 2007 Conference Paper

Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies

Pradeep Varakantham
Janusz Marecki
Yuichi Yabu
Milind Tambe
Makoto Yokoo

Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multi-agent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach has focused on approximate solutions. Though this approach is efficient, the algorithms within this approach do not provide any guarantees on solution quality. A second less popular approach focuses on global optimality, but typical results are available only for two agents, and also at considerable computational cost. This paper overcomes the limitations of both these approaches by providing SPIDER, a novel combination of three key features for policy generation in distributed POMDPs: (i) it exploits agent interaction structure given a network of agents (i. e. allowing easier scale-up to larger number of agents); (ii) it uses a combination of heuristics to speedup policy search; and (iii) it allows quality guaranteed approximations, allowing a systematic tradeoff of solution quality for time. Experimental results show orders of magnitude improvement in performance when compared with previous global optimal algorithms.

AAMAS Conference 2007 Conference Paper

On Opportunistic Techniques for Solving Decentralized Markov Decision Processes with Temporal Constraints

Janusz Marecki
Milind Tambe

Decentralized Markov Decision Processes (DEC-MDPs) are a popular model of agent-coordination problems in domains with uncertainty and time constraints but very difficult to solve. In this paper, we improve a state-of-the-art heuristic solution method for DEC-MDPs, called OC-DEC-MDP, that has recently been shown to scale up to larger DEC-MDPs. Our heuristic solution method, called Value Function Propagation (VFP), combines two orthogonal improvements of OC-DEC-MDP. First, it speeds up OC-DECMDP by an order of magnitude by maintaining and manipulating a value function for each state (as a function of time) rather than a separate value for each pair of sate and time interval. Furthermore, it achieves better solution qualities than OC-DEC-MDP because, as our analytical results show, it does not overestimate the expected total reward like OC-DEC- MDP. We test both improvements independently in a crisis-management domain as well as for other types of domains. Our experimental results demonstrate a significant speedup of VFP over OC-DEC-MDP as well as higher solution qualities in a variety of situations.

AAMAS Conference 2005 Conference Paper

The DEFACTO System for Human Omnipresence to Coordinate Agent Teams: The Future of Disaster Response

Nathan Schurr
Janusz Marecki
Nikhil Kasinadhuni
J. P. Lewis
Milind Tambe
Paul Scerri