Arrow Research search

Author name cluster

Michael P. Wellman

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

84 papers
2 author rows

Possible papers

84

IJCAI Conference 2025 Conference Paper

Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling

  • Zun Li
  • Marc Lanctot
  • Kevin R. McKee
  • Luke Marris
  • Ian Gemp
  • Daniel Hennes
  • Paul Muller
  • Kate Larson

Opponent modeling methods typically involve two crucial steps: building a belief distribution over opponents' strategies, and exploiting this opponent model by playing a best response. However, existing approaches typically require domain-specific heurstics to come up with such a model, and algorithms for approximating best responses are hard to scale in large, imperfect information domains. In this work, we introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning. We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS) with a learned deep generative model that samples world states during planning. This new method scales to large imperfect information domains and can be plug and play in a variety of multiagent algorithms. We use this new method under the framework of Policy Space Response Oracles (PSRO), to automate the generation of an offline opponent model via iterative game-theoretic reasoning and population-based training. We propose using solution concepts based on bargaining theory to build up an opponent mixture, which we find identifying profiles that are near the Pareto frontier. Then GenBR keeps updating an online opponent model and reacts against it during gameplay. We conduct behavioral studies where human participants negotiate with our agents in Deal-or-No-Deal, a class of bilateral bargaining games. Search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare and Nash bargaining score negotiating with humans as humans trading among themselves.

JAIR Journal 2025 Journal Article

Empirical Game Theoretic Analysis: A Survey

  • Michael P. Wellman
  • Karl Tuyls
  • Amy Greenwald

In the empirical approach to game-theoretic analysis (EGTA), the model of the game comes not from declarative representation, but is derived by interrogation of a procedural description of the game environment. The motivation for developing this approach was to enable game-theoretic reasoning about strategic situations too complex for analytic specification and solution. Since its introduction over twenty years ago, EGTA has been applied to a wide range of multiagent domains, from auctions and markets to recreational games to cyber-security. We survey the extensive methodology developed for EGTA over the years, organized by the elemental subproblems comprising the EGTA process. We describe key EGTA concepts and techniques, and the questions at the frontier of EGTA research. Recent advances in machine learning are accelerating progress in EGTA, and promise to significantly expand our capacities for reasoning about complex game situations.

ICML Conference 2025 Conference Paper

Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning

  • Austin A. Nguyen
  • Anri Gu
  • Michael P. Wellman

Iterative extension of empirical game models through deep reinforcement learning (RL) has proved an effective approach for finding equilibria in complex games. When multiple equilibria exist, we may also be interested in finding solutions with particular characteristics. We address this issue of equilibrium selection in the context of Policy Space Response Oracles (PSRO), a flexible game-solving framework based on deep RL, by skewing the strategy exploration process towards higher-welfare solutions. At each iteration, we create an exploration policy that imitates high welfare-yielding behavior and train a response to the current solution, regularized to be similar to the exploration policy. With no additional simulation expense, our approach, named Ex$^2$PSRO, tends to find higher welfare equilibria than vanilla PSRO in two benchmarks: a sequential bargaining game and a social dilemma game. Further experiments demonstrate Ex$^2$PSRO’s composability with other PSRO variants and illuminate the relationship between exploration policy choice and algorithmic performance.

AAMAS Conference 2025 Conference Paper

Learning Bayesian Game Families, with Application to Mechanism Design

  • Madelyn Gatchel
  • Michael P. Wellman

We demonstrate the advantages of learning an interim model for Bayesian game families through an in-depth study of empirical mechanism design for a dynamic sponsored search auction scenario. A full version of this paper, with additional background, methods, and results, is available at: https: //arxiv. org/pdf/2502. 14078.

AAMAS Conference 2025 Conference Paper

Navigating in a Space of Game Views (extended abstract)

  • Michael P. Wellman
  • Katherine Mayo

We motivate and summarize a perspective on game-theoretic reasoning as navigating in a space of game views. The full journal version appears in Autonomous Agents and Multi-Agent Systems [5].

AAMAS Conference 2025 Conference Paper

Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO

  • Christine Konicki
  • Mithun Chakraborty
  • Michael P. Wellman

Policy Space Response Oracles (PSRO) interleaves empirical gametheoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods. Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form using data obtained from querying a simulator that represents a detailed description of the game. We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information. First, we introduce a scalable representation for the empirical game tree where edges correspond to implicit policies learned through DRL. These policies cover conditions in the underlying game abstracted in the game model, supporting sustainable growth of the tree over epochs. Second, we leverage extensive form in the empirical model by employing refined Nash equilibria to direct strategy exploration. To enable this, we give a modular and scalable algorithm based on generalized backward induction for computing a subgame perfect equilibrium (SPE) in an imperfect-information game. We experimentally evaluate our approach on a suite of games including an alternating-offer bargaining game with outside offers; our results demonstrate that TE-PSRO converges toward equilibrium faster when new strategies are generated based on SPE rather than Nash equilibrium, and with reasonable time/memory requirements for the growing empirical model.

IJCAI Conference 2024 Conference Paper

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

  • Zun Li
  • Michael P. Wellman

Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by stochasticity in training and sensitivity of agent performance to the behavior of other agents. We propose a meta-game evaluation framework for deep MARL, by framing each MARL algorithm as a meta-strategy, and repeatedly sampling normal-form empirical games over combinations of meta-strategies resulting from different random seeds. Each empirical game captures both self-play and cross-play factors across seeds. These empirical games provide the basis for constructing a sampling distribution, using bootstrapping, over a variety of game analysis statistics. We use this approach to evaluate state-of-the-art deep MARL algorithms on a class of negotiation games. From statistics on individual payoffs, social welfare, and empirical best-response graphs, we uncover strategic relationships among self-play, population-based, model-free, and model-based MARL methods. We also investigate the effect of run-time search as a meta-strategy operator, and find via meta-game analysis that the search version of a meta-strategy generally leads to improved performance.

RLC Conference 2024 Conference Paper

Co-Learning Empirical Games & World Models

  • Max Olan Smith
  • Michael P. Wellman

Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a broader consideration of possible game dynamics induced by a diversity of strategy profiles. Conversely, world models guide empirical games to efficiently discover new strategies through planning. We demonstrate these benefits first independently, then in combination as a new algorithm, Dyna-PSRO, that co-learns an empirical game and a world model. When compared to PSRO---a baseline empirical-game building algorithm, Dyna-PSRO is found to compute lower regret solutions on partially observable general-sum games. In our experiments, Dyna-PSRO also requires substantially fewer experiences than PSRO, a key algorithmic advantage for settings where collecting player-game interaction data is a cost-limiting factor.

RLJ Journal 2024 Journal Article

Co-Learning Empirical Games & World Models

  • Max Olan Smith
  • Michael P. Wellman

Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a broader consideration of possible game dynamics induced by a diversity of strategy profiles. Conversely, world models guide empirical games to efficiently discover new strategies through planning. We demonstrate these benefits first independently, then in combination as a new algorithm, Dyna-PSRO, that co-learns an empirical game and a world model. When compared to PSRO---a baseline empirical-game building algorithm, Dyna-PSRO is found to compute lower regret solutions on partially observable general-sum games. In our experiments, Dyna-PSRO also requires substantially fewer experiences than PSRO, a key algorithmic advantage for settings where collecting player-game interaction data is a cost-limiting factor.

IJCAI Conference 2024 Conference Paper

Fraud Risk Mitigation in Real-Time Payments: A Strategic Agent-Based Analysis

  • Katherine Mayo
  • Nicholas Grabill
  • Michael P. Wellman

Whereas standard financial mechanisms for payment may take days to finalize, real-time payments (RTPs) provide immediate processing and final receipt of funds. The speed of settlement benefits customers, but raises vulnerability to fraud. We seek to understand how bank nodes may strategically mitigate fraud risk in RTPs, through investment in fraud detection and restricting payments eligible for real-time processing. To study this, we introduce an agent-based model of the payment network supporting both real-time and standard payments, and define a game among banks and fraudsters. Using empirical game-theoretic analysis, we identify Nash equilibria in nine game configurations defined by network attributes. Our analysis finds that as banks become more liable for fraud, they continue to allow RTPs but are more likely to employ both restrictions and a high level of fraud detection. Fraudsters, in response, switch from targeting only RTPs to attempting fraud with any type of payment and tend to exploit banks where they have historically been most successful. We also conduct a strategic feature gains assessment to further understand the benefit offered by each of the bank's risk mitigation measures, which confirms the importance of selective RTP restrictions. Finally, we find that in equilibrium bank strategic decisions negatively affect fraudsters while minimally impacting customers.

AAMAS Conference 2024 Conference Paper

Generalized Response Objectives for Strategy Exploration in Empirical Game-Theoretic Analysis

  • Yongzhao Wang
  • Michael P. Wellman

In the policy-space response oracle (PSRO) framework, strategy sets de�ning an empirical game are iteratively extended by computing each player’s best response to a target pro�le. The method for selecting a target pro�le is called a meta-strategy solver (MSS), and a variety of MSSs have been proposed and analyzed for their e�ectiveness in exploring the strategy space. Here we investigate an alternative means to control strategy exploration: setting the response objective (RO) employed in deriving a strategy for a given target pro�le. In evaluating e�ectiveness of strategy exploration, we consider not only rate of convergence to a solution, but also the quality of solution(s) captured by the evolving empirical game. We perform our study� rst in the domain of sequential bargaining games, comparing the standard RO based on own payo�with others that incorporate other players’ payo�s. We� nd that otherregarding ROs can lead to� nding equilibrium outcomes with significantly higher social welfare than the standard objective. For other proposed ROs, experiments demonstrate that they can di�erentially a�ect the makeup and value of solutions for di�erent players. We further test PSRO with generalized ROs in large attack-graph games. We observe a similar impact and e�ectiveness of our ROs on strategy exploration. Finally, we establish a theoretical relationship between PSRO with generalized ROs and generalized weakened �ctitious play in particular settings, and a connection between the social welfare related RO and Berge equilibrium.

JAAMAS Journal 2024 Journal Article

Navigating in a space of game views

  • Michael P. Wellman
  • Katherine Mayo

Abstract Game-theoretic modeling entails selecting the particular elements of a complex strategic situation deemed most salient for strategic analysis. Recognizing that any game model is one of many possible views of the situation, we term this a game view, and propose that sophisticated game reasoning would naturally consider multiple views. We introduce a conceptual framework, game view navigation, for game-theoretic reasoning through a process of constructing and analyzing a series of game views. The approach is illustrated using a variety of existing methods, which can be cast in terms of navigation patterns within this framework. By formally defining these as well as recently introduced ideas as navigating in a space of game views, we recognize common themes and opportunities for generalization. Game view navigation thus provides a unifying perspective that sheds light on connections between disparate reasoning methods, and defines a design space for creation of new techniques. We further apply the framework by defining and exploring new techniques based on modulating player aggregation in equilibrium search.

AAMAS Conference 2023 Conference Paper

Empirical Game-Theoretic Analysis for Mean Field Games

  • Yongzhao Wang
  • Michael P. Wellman

We present a simulation-based approach for solution of mean field games (MFGs), using the framework of empirical game-theoretical analysis (EGTA). Our primary method employs a version of the double oracle, iteratively adding strategies based on best response to the equilibrium of the empirical MFG among strategies considered so far. We present Fictitious Play (FP) and Replicator Dynamics as two subroutines for computing the empirical game equilibrium. Each subroutine is implemented with a query-based method rather than maintaining an explicit payoff matrix as in typical EGTA methods due to a representation issue we highlight for MFGs. We test the performance of our primary method in various games and show that it outperforms directly applying FP to MFGs with either subroutine. By introducing game model learning and regularization, we significantly improve the sample efficiency of the primary method without sacrificing the overall learning performance. Theoretically, we prove that a Nash equilibrium (NE) exists in the empirical MFG and show the convergence of iterative EGTA to NE of the full MFG.

AAMAS Conference 2023 Conference Paper

Game Model Learning for Mean Field Games

  • Yongzhao Wang
  • Michael P. Wellman

We present an approach to learning models for mean field games from simulation data with a coarse coding scheme that abstracts away the time-dependent complexity and dramatically simplifies the input representation.

AAMAS Conference 2023 Conference Paper

Search-Improved Game-Theoretic Multiagent Reinforcement Learning in General and Negotiation Games

  • Zun Li
  • Marc Lanctot
  • Kevin R. McKee
  • Luke Marris
  • Ian Gemp
  • Daniel Hennes
  • Kate Larson
  • Yoram Bachrach

Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampling of world states, and introduce two new meta-strategy solvers based on the Nash bargaining solution. We evaluate PSRO’s ability to compute approximate Nash equilibrium, and its performance in negotiation games: Colored Trails and Dealor-no-Deal. We conduct behavioral studies where human participants negotiate with our agents (𝑁 = 346). Search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare negotiating with humans as humans trading among themselves.

JMLR Journal 2023 Journal Article

Strategic Knowledge Transfer

  • Max Olan Smith
  • Thomas Anthony
  • Michael P. Wellman

In the course of playing or solving a game, it is common to face a series of changing other-agent strategies. These strategies often share elements: the set of possible policies to play has overlap, and the policies are sampled at the beginning of play by possibly differing distributions. As it faces the series of strategies, therefore, an agent has the opportunity to transfer its learned play against the previously encountered other-agent policies. We tackle two problems: (1) how can learned responses transfer across changing opponent strategies, and (2) how can this transfer be used to reduced the cumulative cost of learning in game solving. The first problem we characterize as the strategic knowledge transfer problem. For value-based response policies, we demonstrate that Q-Mixing approximately solves this problem by appropriately averaging the component Q-values. Solutions to the first problem can be applied to reduce the computational cost of learning-based game solving algorithms. We offer two algorithms that operate within the Policy-Space Response Oracles (PSRO) framework. Mixed-Oracles reduces the per-policy construction cost by transferring responses from previously encountered opponents. Mixed-Opponents performs strategic knowledge transfer by combining the previously encountered opponents into a single novel policy. Experimental evaluation of these methods on general-sum grid-world games provide evidence about their advantages and limitations in comparison to standard PSRO. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

AAMAS Conference 2022 Conference Paper

Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis

  • Yongzhao Wang
  • Qiurui Ma
  • Michael P. Wellman

In empirical game-theoretic analysis (EGTA), game models are extended iteratively through a process of generating new strategies based on experience with prior strategies. The strategy exploration problem in EGTA is how to direct this process so to construct effective models with minimal iteration. A variety of approaches have been proposed in the literature, including methods based on classic techniques and novel concepts. Comparing the performance of these alternatives can depend sensitively on criteria adopted and measures employed. We investigate some of the methodological considerations in evaluating strategy exploration, proposing and justifying new evaluation methods based on examples and experimental observations. In particular, we emphasize the fact that empirical games create a space of strategies and evaluation should reflect how well it covers the strategically relevant space. Based on this fact, we suggest that the minimum regret constrained profile (MRCP) provides a particularly robust basis for evaluating a space of strategies, and propose a local search method for computing MRCP. However, MRCP computation is not always feasible especially in large games. To evaluate strategy exploration in large games, we propose a new evaluation scheme that measures the strategic coverage of an empirical game. Specifically, we highlight consistency considerations for comparing across different approaches. We show that violation of the consistency considerations could yield misleading conclusions on the performance of different approaches. In accord with consistency considerations, we propose a profileselection method, which effectively discovers the profile that can represent the strategic coverage of an empirical game through its regret information. We show that our evaluation scheme reveals the authentic learning performance of different approaches compared to previous evaluation methods.

AAMAS Conference 2021 Conference Paper

A Strategic Analysis of Portfolio Compression

  • Katherine Mayo
  • Michael P. Wellman

We analyze portfolio compression, the netting of cycles in a financial network, as a strategic decision made by firms within a debt network. We define a network game in which firms have only local information and ask what criteria the firms should consider in their decision to compress. We propose a variety of heuristic strategies and evaluate them using agent-based simulation and empirical game-theoretic analysis. Our results show that some simple strategies based on local information perform better than the unconditional strategies of always agreeing or disagreeing to a compression and that when the decision is made strategically, the price of anarchy is always high.

AAAI Conference 2021 Conference Paper

Evolution Strategies for Approximate Solution of Bayesian Games

  • Zun Li
  • Michael P. Wellman

We address the problem of solving complex Bayesian games, characterized by high-dimensional type and action spaces, many (> 2) players, and general-sum payoffs. Our approach applies to symmetric one-shot Bayesian games, with no given analytic structure. We represent agent strategies in parametric form as neural networks, and apply natural evolution strategies (NES) (Wierstra et al. 2014) for deep model optimization. For pure equilibrium computation, we formulate the problem as bi-level optimization, and employ NES in an iterative algorithm to implement both inner-loop best response optimization and outer-loop regret minimization. In simple games including first- and second-price auctions, it is capable of recovering known analytic solutions. For mixed equilibrium computation, we adopt an incremental strategy generation framework, with NES as strategy generator producing a finite sequence of approximate best-response strategies. We then calculate equilibria over this finite strategy set via a model-based optimization process. Both our pure and mixed equilibrium computation methods employ NES to efficiently search for strategies over the function space, given only black-box simulation access to noisy payoff samples. We experimentally demonstrate the efficacy of all methods on two simultaneous sealed-bid auction games with distinct type distributions, and observe that the solutions exhibit qualitatively different behavior in these two environments.

ICLR Conference 2021 Conference Paper

Iterative Empirical Game Solving via Single Policy Best Response

  • Max Olan Smith
  • Thomas W. Anthony 0001
  • Michael P. Wellman

Policy-Space Response Oracles (PSRO) is a general algorithmic framework for learning policies in multiagent systems by interleaving empirical game analysis with deep reinforcement learning (DRL). At each iteration, DRL is invoked to train a best response to a mixture of opponent policies. The repeated application of DRL poses an expensive computational burden as we look to apply this algorithm to more complex domains. We introduce two variations of PSRO designed to reduce the amount of simulation required during DRL training. Both algorithms modify how PSRO adds new policies to the empirical game, based on learned responses to a single opponent policy. The first, Mixed-Oracles, transfers knowledge from previous iterations of DRL, requiring training only against the opponent's newest policy. The second, Mixed-Opponents, constructs a pure-strategy opponent by mixing existing strategy's action-value estimates, instead of their policies. Learning against a single policy mitigates conflicting experiences on behalf of a learner facing an unobserved distribution of opponents. We empirically demonstrate that these algorithms substantially reduce the amount of simulation during training required by PSRO, while producing equivalent or better solutions to the game.

IJCAI Conference 2020 Conference Paper

Market Manipulation: An Adversarial Learning Framework for Detection and Evasion

  • Xintong Wang
  • Michael P. Wellman

We propose an adversarial learning framework to capture the evolving game between a regulator who develops tools to detect market manipulation and a manipulator who obfuscates actions to evade detection. The model includes three main parts: (1) a generator that learns to adapt original manipulation order streams to resemble trading patterns of a normal trader while preserving the manipulation intent; (2) a discriminator that differentiates the adversarially adapted manipulation order streams from normal trading activities; and (3) an agent-based simulator that evaluates the manipulation effect of adapted outputs. We conduct experiments on simulated order streams associated with a manipulator and a market-making agent respectively. We show examples of adapted manipulation order streams that mimic a specified market maker's quoting patterns and appear qualitatively different from the original manipulation strategy we implemented in the simulator. These results demonstrate the possibility of automatically generating a diverse set of (unseen) manipulation strategies that can facilitate the training of more robust detection algorithms.

IJCAI Conference 2019 Conference Paper

Cap-and-Trade Emissions Regulation: A Strategic Analysis

  • Frank Cheng
  • Yagil Engel
  • Michael P. Wellman

Cap-and-trade schemes are designed to achieve target levels of regulated emissions in a socially efficient manner. These schemes work by issuing regulatory credits and allowing firms to buy and sell them according to their relative compliance costs. Analyzing the efficacy of such schemes in concentrated industries is complicated by the strategic interactions among firms producing heterogeneous products. We tackle this complexity via an agent-based microeconomic model of the US market for personal vehicles. We calculate Nash equilibria among credits-trading strategies in a variety of scenarios and regulatory models. We find that while cap-and-trade results improves efficiency overall, consumers bear a disproportionate share of regulation cost, as firms use credit trading to segment the vehicle market. Credits trading volume decreases when firms behave more strategically, which weakens the segmentation effect.

AAAI Conference 2019 Conference Paper

Deception in Finitely Repeated Security Games

  • Thanh H. Nguyen
  • Yongzhao Wang
  • Arunesh Sinha
  • Michael P. Wellman

Allocating resources to defend targets from attack is often complicated by uncertainty about the attacker’s capabilities, objectives, or other underlying characteristics. In a repeated interaction setting, the defender can collect attack data over time to reduce this uncertainty and learn an effective defense. However, a clever attacker can manipulate the attack data to mislead the defender, influencing the learning process toward its own benefit. We investigate strategic deception on the part of an attacker with private type information, who interacts repeatedly with a defender. We present a detailed computation and analysis of both players’ optimal strategies given the attacker may play deceptively. Computational experiments illuminate conditions conducive to strategic deception, and quantify benefits to the attacker. By taking into account the attacker’s deception capacity, the defender can significantly mitigate loss from misleading attack actions.

AAMAS Conference 2019 Conference Paper

Incentivizing Collaboration in a Competition

  • Arunesh Sinha
  • Michael P. Wellman

Research and design competitions aim to promote innovation or creative production, which are often best achieved through collaboration. The nature of a competition, however, typically necessitates sorting by individual performance. This presents tradeoffs for the competition designer, between incentivizing global performance and distinguishing individual capability. We model this situation in terms of an abstract collaboration game, where individual effort also benefits neighboring agents. We propose a scoring mechanism called LSWM that rewards agents based on localized social welfare. We show that LSWM promotes global performance, in that social optima are equilibria of the mechanism. Moreover, we establish conditions under which the mechanism leads to increased collaboration, and under which it ensures a formally defined distinguishability property. Through experiments, we evaluate the degree of distinguishability achieved whether or not the theoretical conditions identified hold.

IJCAI Conference 2018 Conference Paper

A Cloaking Mechanism to Mitigate Market Manipulation

  • Xintong Wang
  • Yevgeniy Vorobeychik
  • Michael P. Wellman

We propose a cloaking mechanism to deter spoofing, a form of manipulation in financial markets. The mechanism works by symmetrically concealing a specified number of price levels from the inside of the order book. To study the effectiveness of cloaking, we simulate markets populated with background traders and an exploiter, who strategically spoofs to profit. The traders follow two representative bidding strategies: the non-spoofable zero intelligence and the manipulable heuristic belief learning. Through empirical game-theoretic analysis across parametrically different environments, we evaluate surplus accrued by traders, and characterize the conditions under which cloaking mitigates manipulation and benefits market welfare. We further design sophisticated spoofing strategies that probe to reveal cloaked information, and find that the effort and risk exceed the gains.

AAMAS Conference 2018 Conference Paper

Evaluating the Stability of Non-Adaptive Trading in Continuous Double Auctions

  • Mason Wright
  • Michael P. Wellman

The continuous double auction (CDA) is the predominant mechanism in modern securities markets. Many agent-based analyses of CDA environments rely on simple non-adaptive trading strategies like Zero Intelligence (ZI), which (as their name suggests) are quite limited. We examine the viability of this reliance through empirical game-theoretic analysis in a plausible market environment. Specifically, we evaluate the strategic stability of equilibria defined over a small set of ZI traders with respect to strategies found by reinforcement learning (RL) applied over a much larger policy space. RL can indeed find beneficial deviations from equilibria of ZI traders, by conditioning on signals of the likelihood a trade will execute or the favorability of the current bid and ask. Nevertheless, the surplus earned by well-calibrated ZI policies is empirically observed to be nearly as great as what the adaptive strategies can earn, despite their much more expressive policy space. Our findings generally support the use of equilibrated ZI traders in CDA studies.

AAMAS Conference 2017 Conference Paper

Spoofing the Limit Order Book: An Agent-Based Model

  • Xintong Wang
  • Michael P. Wellman

We present an agent-based model of manipulating prices in financial markets through spoofing: submitting spurious orders to mislead traders who observe the order book. Built around the standard limit-order mechanism, our model captures a complex market environment with combined private and common values, the latter represented by noisy observations upon a dynamic fundamental time series. We consider background agents following two types of trading strategies: zero intelligence (ZI) that ignores the order book and heuristic belief learning (HBL) that exploits the order book to predict price outcomes. By employing an empirical gametheoretic analysis to derive approximate strategic equilibria, we demonstrate the e↵ectiveness of HBL and the usefulness of order book information in a range of non-spoofing environments. We further show that a market with HBL traders is spoofable, in that a spoofer can qualitatively manipulate prices towards its desired direction. After re-equilibrating games with spoofing, we find spoofing generally hurts market surplus and decreases the proportion of HBL. However, HBL’s persistence in most environments with spoofing indicates a consistently spoofable market. Our model provides a way to quantify the e↵ect of spoofing on trading behavior and efficiency, and thus measures the profitability and cost of an important form of market manipulation. CCS Concepts •Computing methodologies! Multi-agent systems;

JAIR Journal 2017 Journal Article

Welfare Effects of Market Making in Continuous Double Auctions

  • Elaine Wah
  • Mason Wright
  • Michael P. Wellman

We investigate the effects of market making on market performance, focusing on allocative efficiency as well as gains from trade accrued by background traders. We employ empirical simulation-based methods to evaluate heuristic strategies for market makers as well as background investors in a variety of complex trading environments. Our market model incorporates private and common valuation elements, with dynamic fundamental value and asymmetric information. In this context, we compare the surplus achieved by background traders in strategic equilibrium, with and without a market maker. Our findings indicate that the presence of the market maker strongly tends to increase total welfare across various environments. Market-maker profit may or may not exceed the welfare gain, thus the effect on background-investor surplus is ambiguous. We find that market making tends to benefit investors in relatively thin markets, and situations where background traders are impatient, due to limited trading opportunities. The presence of additional market makers increases these benefits, as competition drives the market makers to provide liquidity at lower price spreads. A thorough sensitivity analysis indicates that these results are robust to reasonable changes in model parameters.

UAI Conference 2016 Conference Paper

Gradient Methods for Stackelberg Games

  • Kareem Amin 0002
  • Michael P. Wellman
  • Satinder Singh 0001

Stackelberg games are two-stage games in which the first player (called the leader) commits to a strategy, after which the other player (the follower) selects a best-response. These types of games have seen numerous practical application in security settings, where the leader (in this case, a defender) must allocate resources to protect various targets. Real world applications include the scheduling of US federal air marshals to international flights, and resource allocation at LAX airport. However, the best known algorithm for solving general Stackelberg games requires solving Integer Programs, and fails to scale beyond a few (significantly smaller than 100) number of leader actions, or follower types. In this paper, we present a new gradient-based approach for solving large Stackelberg games in security settings. Large-scale control problems are often solved by restricting the controller to a rich parameterized class of policies; the optimal control can then be computed using Monte Carlo gradient methods. We demonstrate that the same approach can be taken in a strategic setting. We evaluate our approach empirically, demonstrating that it can have negligible regret against the leader’s true equilibrium strategy, while scaling to large games.

JAAMAS Journal 2016 Journal Article

Putting the agent in agent-based modeling

  • Michael P. Wellman

Abstract One of the perquisites of a talk like this is that I get to expound on broad themes. AAMAS is a conference about agents and multiples of agents, so I probably ought to say something about agents. Of course, my position on agents is that I am all for them. Today I’d like to make a case for actually putting agents in agent-based models. I hope that by the end of the talk you have some idea about what I mean by this.

IJCAI Conference 2016 Conference Paper

Welfare Effects of Market Making in Continuous Double Auctions: Extended Abstract

  • Elaine Wah
  • Mason Wright
  • Michael P. Wellman

We investigate the effects of market making on market performance, focusing on allocative efficiency as well as gains from trade accrued by background traders. We employ empirical simulation-based methods to evaluate heuristic strategies for market makers as well as background investors in a variety of complex trading environments. Our market model incorporates private and common valuation elements, with dynamic fundamental value and asymmetric information. In this context, we compare the surplus achieved by background traders in strategic equilibrium, with and without a market maker. Our findings indicate that the presence of the market maker strongly tends to increase total welfare across a variety of environments. Market-maker profit may or may not exceed the welfare gain, thus the effect on background-investor surplus is ambiguous. We find that market making tends to benefit investors in relatively thin markets, and situations where background traders are impatient, due to limited trading opportunities. Introducing additional market makers increases these benefits, as competition drives market makers to provide liquidity at lower price spreads.

UAI Conference 2012 Conference Paper

Self-Confirming Price Prediction Strategies for Simultaneous One-Shot Auctions

  • Michael P. Wellman
  • Eric Sodomka
  • Amy Greenwald

Bidding in simultaneous auctions is challenging because an agent’s value for a good in one auction may depend on the uncertain outcome of other auctions: the so-called exposure problem. Given the gap in understanding of general simultaneous auction games, previous works have tackled this problem with heuristic strategies that employ probabilistic price predictions. We define a concept of self-confirming prices, and show that within an independent private value model, Bayes-Nash equilibrium can be fully characterized as a profile of optimal priceprediction strategies with self-confirming predictions. We exhibit practical procedures to compute approximately optimal bids given a probabilistic price prediction, and near self-confirming price predictions given a price-prediction strategy. An extensive empirical game-theoretic analysis demonstrates that self-confirming priceprediction strategies are effective in simultaneous auction games with both complementary and substitutable preference structures.

JAAMAS Journal 2011 Journal Article

Constrained automated mechanism design for infinite games of incomplete information

  • Yevgeniy Vorobeychik
  • Daniel M. Reeves
  • Michael P. Wellman

Abstract We present a functional framework for automated Bayesian and worst-case mechanism design, based on a two-stage game model of strategic interaction between the designer and the mechanism participants. At the core of our framework is a black-box optimization algorithm which guides the process of evaluating candidate mechanisms. We apply the approach to several classes of two-player infinite games of incomplete information, producing optimal or nearly optimal mechanisms using various objective functions. By comparing our results with known optimal mechanisms, and in some cases improving on the best known mechanisms, we provide evidence that ours is a promising approach to parametrized mechanism design for infinite Bayesian games.

UAI Conference 2011 Conference Paper

The Structure of Signals: Causal Interdependence Models for Games of Incomplete Information

  • Michael P. Wellman
  • Lu Hong
  • Scott E. Page

Traditional economic models typically treat private information, or signals, as generated from some underlying state. Recent work has explicated alternative models, where signals correspond to interpretations of available information. We show that the difference between these formulations can be sharply cast in terms of causal dependence structure, and employ graphical models to illustrate the distinguishing characteristics. The graphical representation supports inferences about signal patterns in the interpreted framework, and suggests how results based on the generated model can be extended to more general situations. Specific insights about bidding games in classical auction mechanisms derive from qualitative graphical models.

IJCAI Conference 2009 Conference Paper

  • Quang Duong
  • Yevgeniy Vorobeychik
  • Satinder Singh
  • Michael P. Wellman

Graphical games provide compact representation of a multiagent interaction when agents’ payoffs depend only on actions of agents in their local neighborhood. We formally describe the problem of learning a graphical game model from limited observation of the payoff function, define three performance metrics for evaluating learned games, and investigate several learning algorithms based on minimizing empirical loss. Our first algorithm is a branch-and-bound search, which takes advantage of the structure of the empirical loss function to derive upper and lower bounds on loss at every node of the search tree. We also examine a greedy heuristic and local search algorithms. Our experiments with directed graphical games show that (i) when only a small sample of profile payoffs is available, branch-and-bound significantly outperforms other methods, and has competitive running time, but (ii) when many profiles are observed, greedy is nearly optimal and considerably better than other methods, at a fraction of branch-andbound’s running time. The results are comparable for undirected graphical games and when payoffs are sampled with noise.

AAMAS Conference 2009 Conference Paper

Generalization Risk Minimization in Empirical Game Models

  • Patrick R. Jordan
  • Michael P. Wellman

Experimental analysis of agent strategies in multiagent systems presents a tradeoff between granularity and statistical confidence. Collecting a large amount of data about each strategy profile improves confidence, but restricts the range of strategies and profiles that can be explored. We propose a flexible approach, where multiple game-theoretic formulations can be constructed to model the same underlying scenario (observation dataset). The prospect of incorrectly selecting an empirical model is termed generalization risk, and the generalization risk framework we describe provides a general criterion for empirical modeling choices, such as adoption of factored strategies or other structured representations of a game model. We propose a principled method of managing generalization risk to derive the optimal game-theoretic model for the observed data in a restricted class of models. Application to a large dataset generated from a trading agent scenario validates the method. General Terms Economics, Experimentation

AAMAS Conference 2009 Conference Paper

Stronger CDA Strategies through Empirical Game-Theoretic Analysis and Reinforcement Learning

  • L. Julian Schvartzman
  • Michael P. Wellman

We present a general methodology to automate the search for equilibrium strategies in games derived from computational experimentation. Our approach interleaves empirical game-theoretic analysis with reinforcement learning. We apply this methodology to the classic Continuous Double Auction game, conducting the most comprehensive CDA strategic study published to date. Empirical game analysis confirms prior findings about the relative performance of known strategies. Reinforcement learning derives new bidding strategies from the empirical equilibrium environment. Iterative application of this approach yields strategies stronger than any other published CDA bidding policy, culminating in a new Nash equilibrium supported exclusively by our learned strategies.

UAI Conference 2008 Conference Paper

Knowledge Combination in Graphical Multiagent Models

  • Quang Duong 0001
  • Michael P. Wellman
  • Satinder Singh 0001

A graphical multiagent model (GMM) represents a joint distribution over the behavior of a set of agents. One source of knowledge aboutagents'behaviormaycomefromgametheoretic analysis, as captured by several graphical game representations developed in recentyears. GMMsgeneralizethisapproach to express arbitrary distributions, based on game descriptions or other sources of knowledgebearingonbeliefsaboutagentbehavior. To illustrate the exibility of GMMs, we exhibit game-derived models that allow probabilistic deviation fromequilibrium, aswellas models based on heuristic action choice. We investigate three di erent methods of integrating these models into a single model representing the combined knowledge sources. To evaluate the predictive performance of the combined model, we treat as actual outcome the behavior produced by a reinforcement learning process. We nd that combining the two knowledge sources, using any of the methods, provides better predictions than either source alone. Among the combination methods, mixing data outperforms the opinion pool and direct update methods investigated in this empirical trial.

IJCAI Conference 2007 Conference Paper

  • Shih-Fen Cheng
  • Michael P. Wellman

We introduce a weakening of standard game-theoretic dominance conditions, called δ -dominance, which enables more aggressive pruning of candidate strategies at the cost of solution accuracy. Equilibria of a game obtained by eliminating a δ -dominated strategy are guaranteed to be approximate equilibria of the original game, with degree of approximation bounded by the dominance parameter, δ . We can apply elimination of δ -dominated strategies iteratively, but the δ for which a strategy may be eliminated depends on prior eliminations. We discuss implications of this order independence, and propose greedy heuristics for determining a sequence of eliminations to reduce the game as far as possible while keeping down costs. A case study analysis of an empirical 2-player game serves to illustrate the technique, and demonstrate the utility of weaker-than-weak dominance pruning.

UAI Conference 2007 Conference Paper

Constrained Automated Mechanism Design for Infinite Games of Incomplete Information

  • Yevgeniy Vorobeychik
  • Daniel M. Reeves
  • Michael P. Wellman

We present a functional framework for automated mechanism design based on a two-stage game model of strategic interaction between the designer and the mechanism participants, and apply it to several classes of two-player infinite games of incomplete information. At the core of our framework is a black-box optimization algorithm which guides the selection process of candidate mechanisms. Our approach yields optimal or nearly optimal mechanisms in several application domains using various objective functions. By comparing our results with known optimal mechanisms, and in some cases improving on the best known mechanisms, we provide evidence that ours is a promising approach to parametric design of indirect mechanisms.

AAMAS Conference 2007 Conference Paper

Constraint Satisfaction Algorithms for Graphical Games

  • Vishal Soni
  • Satinder Singh
  • Michael P. Wellman

We formulate the problem of computing equilibria in multiplayer games represented by arbitrary undirected graphs as a constraint satisfaction problem and present two algorithms. The first is PureProp: an algorithm for computing approximate Nash equilibria in complete information one-shot games and approximate Bayes-Nash equilibria in one-shot games of incomplete information. PureProp unifies existing message-passing based algorithms for solving these classes of games. We also address repeated graphical games, and present a second algorithm, PureProp-R, for computing approximate Nash equilibria in these games.

AAMAS Conference 2007 Conference Paper

Empirical Game-Theoretic Analysis of the TAC Supply Chain Game

  • Patrick R. Jordan
  • Christopher Kiekintveld
  • Michael P. Wellman

The TAC Supply Chain Management (TAC/SCM) game presents a challenging dynamic environment for autonomous decision-making in a salient application domain. Strategic interactions complicate the analysis of games such as TAC/SCM, since the effectiveness of a given strategy depends on the strategies played by other agents on the supply chain. The TAC tournament generates results from one particular path of combinations, and success in the tournament is rightly regarded as evidence for agent quality. Such results along with post-competition controlled experiments provide useful evaluations of novel techniques employed in the game. We argue that a broader game-theoretic analysis framework can provide a firmer foundation for choice of experimental contexts. Exploiting a repository of agents from the 2005 and 2006 TAC/SCM tournaments, we demonstrate an empirical game-theoretic methodology based on extensive simulation and careful measurement. Our analysis of agents from TAC-05 reveals interesting interactions not seen in the tournament. Extending the analysis to TAC-06 enables us to measure progress from year-to-year, and generates a candidate empirical equilibrium among the best known strategies. We use this equilibrium as a stable background population for comparing relative performance of the 2006 agents, yielding insights complementing the tournament results.

AAMAS Conference 2007 Conference Paper

Forecasting Market Prices in a Supply Chain Game

  • Christopher Kiekintveld
  • Jason Miller
  • Patrick R. Jordan
  • Michael P. Wellman

Future market conditions can be a pivotal factor in making business decisions. We present and evaluate methods used by our agent, Deep Maize, to forecast market prices in the Trading Agent Competition Supply Chain Management Game. As a guiding principle we seek to exploit as many sources of available information as possible to inform predictions. Since information comes in several different forms, we integrate well-known methods in a novel way to make predictions. The core of our predictor is a nearest-neighbors machine learning algorithm that identifies historical instances with similar economic indicators. We augment this with an online learning procedure that transforms the predictions by optimizing a scoring rule. This allows us to select more relevant historical contexts using additional information available during individual games. We also explore the advantages of two different representations for predicting price distributions. One uses absolute prices, and the other uses statistics of price time series to exploit local stability. We evaluate these methods using both data from the 2005 tournament final round and additional simulations. We compare several variations of our predictor to one another and a baseline predictor similar to those used by many other tournament agents. We show substantial improvements over the baseline predictor, and demonstrate that each element of our predictor contributes to improved performance.

AAAI Conference 2005 Conference Paper

Approximate Strategic Reasoning through Hierarchical Reduction of Large Symmetric Games

  • Michael P. Wellman
  • Kevin M. Lochner

To deal with exponential growth in the size of a game with the number of agents, we propose an approximation based on a hierarchy of reduced games. The reduced game achieves savings by restricting the number of agents playing any strategy to fixed multiples. We validate the idea through experiments on randomly generated local-effect games. An extended application to strategic reasoning about a complex trading scenario motivates the approach, and demonstrates methods for game-theoretic reasoning over incompletely-specified games at multiple levels of granularity.

IJCAI Conference 2005 Conference Paper

Learning Payoff Functions in Infinite Games

  • Yevgeniy Vorobeychik
  • Michael P. Wellman
  • Satinder

We consider a class of games with real-valued strategies and payoff information available only in the form of data from a given sample of strategy profiles. Solving such games with respect to the underlying strategy space requires generalizing from the data to a complete payoff-function representation. We address payoff-functionlearning as a standard regression problem, with provision for capturing known structure (symmetry) in the multiagent environment. To measure learning performance, we consider the relative utility of prescribed strategies, rather than the accuracy of payoff functions per se. We demonstrate our approach and evaluate its effectiveness on two examples: a two-player version of the first-price sealed-bid auction (with known analytical form), and a five-player marketbased scheduling game (with no known solution).

UAI Conference 2005 Conference Paper

Self-Confirming Price Prediction for Bidding in Simultaneous Ascending Auctions

  • Anna Osepayshvili
  • Michael P. Wellman
  • Daniel M. Reeves
  • Jeffrey K. MacKie-Mason

Simultaneous ascending auctions present agents with the exposure problem: bidding to acquire a bundle risks the possibility of obtaining an undesired subset of the goods. Auction theory provides little guidance for dealing with this problem. We present a new family of decisiontheoretic bidding strategies that use probabilistic predictions of final prices. We focus on selfconfirming price distribution predictions, which by definition turn out to be correct when all agents bid decision-theoretically based on them. Bidding based on these is provably not optimal in general, but our experimental evidence indicates the strategy can be quite effective compared to other known methods.

UAI Conference 2004 Conference Paper

Computing Best-Response Strategies in Infinite Games of Incomplete Information

  • Daniel M. Reeves
  • Michael P. Wellman

We describe an algorithm for computing best response strategies in a class of two-player infinite games of incomplete information, defined by payoffs piecewise linear in agents' types and actions, conditional on linear comparisons of agents' actions. We show that this class includes many well-known games including a variety of auctions and a novel allocation game. In some cases, the best-response algorithm can be iterated to compute Bayes-Nash equilibria. We demonstrate the efficiency of our approach on existing and new games.

ICAPS Conference 2004 Conference Paper

Distributed Feedback Control for Decision Making on Supply Chains

  • Christopher Kiekintveld
  • Michael P. Wellman
  • Satinder Singh 0001
  • Joshua Estelle
  • Yevgeniy Vorobeychik
  • Vishal Soni
  • Matthew R. Rudary

Decision makers on supply chains face an uncertain, dynamic, and strategic multiagent environment. We report on Deep Maize, an agent we designed to participate in the 2003 Trading Agent Competition Supply Chain Management (TAC/SCM) game. Our design employs an idealized equilibrium analysis of the SCM game to factor out the strategic aspects of the environment and to define an expected profitable zone of operation. Deep Maize applies distributed feedback control to coordinate its separate functional modules and maintain its environment in the desired zone, despite the uncertainty and dynamism. We evaluate our design with results from the TAC/SCM tournament as well as from controlled experiments conducted after the competition.

ICAPS Conference 2004 Conference Paper

Price Prediction Strategies for Market-Based Scheduling

  • Jeffrey K. MacKie-Mason
  • Anna Osepayshvili
  • Daniel M. Reeves
  • Michael P. Wellman

In a market-based scheduling mechanism, the allocation of time-specific resources to tasks is governed by a competitive bidding process. Agents bidding for multiple, separately allocated time slots face the risk that they will succeed in obtaining only part of their requirement, incurring expenses for potentially worthless slots. We investigate the use of price prediction strategies to manage such risk. Given an uncertain price forecast, agents follow simple rules for choosing whether and on which time slots to bid. We find that employing price predictions can indeed improve performance over a straightforward baseline in some settings. Using an empirical game-theoretic methodology, we establish Nash equilibrium profiles for restricted strategy sets. This allows us to con- firm the stability of price-predicting strategies, and measure overall efficiency. We further experiment with variant strategies to analyze the source of prediction’s power, demonstrate the existence of self-confirming predictions, and compare the performance of alternative prediction methods.

JMLR Journal 2003 Journal Article

Nash Q-Learning for General-Sum Stochastic Games

  • Junling Hu
  • Michael P. Wellman

We extend Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games. A learning agent maintains Q-functions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Q-values. This learning protocol provably converges given certain restrictions on the stage games (defined by Q-values) that arise during learning. Experiments with a pair of two-player grid games suggest that such restrictions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique equilibrium Q-function, but sometimes fails to converge in the second, which has three different equilibrium Q-functions. In a comparison of offline learning performance in both games, we find agents are more likely to reach a joint optimal path with Nash Q-learning than with a single-agent Q-learning method. When at least one agent adopts Nash Q-learning, the performance of both agents is better than using single-agent Q-learning. We have also implemented an online version of Nash Q-learning that balances exploration with exploitation, yielding improved performance. [abs] [ pdf ][ ps.gz ][ ps ]

AAAI Conference 2002 Conference Paper

The 2001 Trading Agent Competition

  • Michael P. Wellman
  • Brown University; Peter Stone

The 2001 Trading Agent Competition was the second in a series of events aiming to shed light on research issues in automating trading strategies. Based on a challenging market scenario in the domain of travel shopping, the competition presents agents with difficult issues in bidding strategy, market prediction, and resource allocation. Entrants in 2001 demonstrated substantial progress over the prior year, with the overall level of competence exhibited suggesting that trading in online markets is a viable domain for highly autonomous agents.

UAI Conference 2000 Conference Paper

Compact Securities Markets for Pareto Optimal Reallocation of Risk

  • David M. Pennock
  • Michael P. Wellman

The emph{securities market} is the fundamental theoretical framework in economics and finance for resource allocation under uncertainty. Securities serve both to reallocate risk and to disseminate probabilistic information. emph{Complete} securities markets---which contain one security for every possible state of nature---support Pareto optimal allocations of risk. Complete markets suffer from the same exponential dependence on the number of underlying events as do joint probability distributions. We examine whether markets can be structured and ``compacted'' in the same manner as Bayesian network representations of joint distributions. We show that, if all agents' risk-neutral independencies agree with the independencies encoded in the market structure, then the market is emph{operationally complete}: risk is still Pareto optimally allocated, yet the number of securities can be exponentially smaller. For collections of agents of a certain type, agreement on Markov independencies is sufficient to admit compact and operationally complete markets.

UAI Conference 2000 Conference Paper

Probabilistic State-Dependent Grammars for Plan Recognition

  • David V. Pynadath
  • Michael P. Wellman

Techniques for plan recognition under uncertainty require a stochastic model of the plan-generation process. We introduce Probabilistic State-Dependent Grammars (PSDGs) to represent an agent's plan-generation process. The PSDG language model extends probabilistic context-free grammars (PCFGs) by allowing production probabilities to depend on an explicit model of the planning agent's internal and external state. Given a PSDG description of the plan-generation process, we can then use inference algorithms that exploit the particular independence properties of the PSDG language to efficiently answer plan-recognition queries. The combination of the PSDG language model and inference algorithms extends the range of plan-recognition domains for which practical probabilistic inference is possible, as illustrated by applications in traffic monitoring and air combat.

UAI Conference 1999 Conference Paper

Graphical Representations of Consensus Belief

  • David M. Pennock
  • Michael P. Wellman

Graphical models based on conditional independence support concise encodings of the subjective belief of a single agent. A natural question is whether the consensus belief of a group of agents can be represented with equal parsimony. We prove, under relatively mild assumptions, that even if everyone agrees on a common graph topology, no method of combining beliefs can maintain that structure. Even weaker conditions rule out local aggregation within conditional probability tables. On a more positive note, we show that if probabilities are combined with the logarithmic opinion pool (LogOP), then commonly held Markov independencies are maintained. This suggests a straightforward procedure for constructing a consensus Markov network. We describe an algorithm for computing the LogOP with time complexity comparable to that of exact Bayesian inference.

UAI Conference 1998 Conference Paper

Incremental Tradeoff Resolution in Qualitative Probabilistic Networks

  • Chao-Lin Liu
  • Michael P. Wellman

Qualitative probabilistic reasoning in a Bayesian network often reveals tradeoffs: relationships that are ambiguous due to competing qualitative influences. We present two techniques that combine qualitative and numeric probabilistic reasoning to resolve such tradeoffs, inferring the qualitative relationship between nodes in a Bayesian network. The first approach incrementally marginalizes nodes that contribute to the ambiguous qualitative relationships. The second approach evaluates approximate Bayesian networks for bounds of probability distributions, and uses these bounds to determinate qualitative relationships in question. This approach is also incremental in that the algorithm refines the state spaces of random variables for tighter bounds until the qualitative relationships are resolved. Both approaches provide systematic methods for tradeoff resolution at potentially lower computational cost than application of purely numeric methods.

UAI Conference 1998 Conference Paper

Using Qualitative Relationships for Bounding Probability Distributions

  • Chao-Lin Liu
  • Michael P. Wellman

We exploit qualitative probabilistic relationships among variables for computing bounds of conditional probability distributions of interest in Bayesian networks. Using the signs of qualitative relationships, we can implement abstraction operations that are guaranteed to bound the distributions of interest in the desired direction. By evaluating incrementally improved approximate networks, our algorithm obtains monotonically tightening bounds that converge to exact distributions. For supermodular utility functions, the tightening bounds monotonically reduce the set of admissible decision alternatives as well.

UAI Conference 1997 Conference Paper

Representing Aggregate Belief through the Competitive Equilibrium of a Securities Market

  • David M. Pennock
  • Michael P. Wellman

We consider the problem of belief aggregation: given a group of individual agents with probabilistic beliefs over a set of uncertain events, formulate a sensible consensus or aggregate probability distribution over these events. Researchers have proposed many aggregation methods, although on the question of which is best the general consensus is that there is no consensus. We develop a market-based approach to this problem, where agents bet on uncertain events by buying or selling securities contingent on their outcomes. Each agent acts in the market so as to maximize expected utility at given securities prices, limited in its activity only by its own risk aversion. The equilibrium prices of goods in this market represent aggregate beliefs. For agents with constant risk aversion, we demonstrate that the aggregate probability exhibits several desirable properties, and is related to independently motivated techniques. We argue that the market-based approach provides a plausible mechanism for belief aggregation in multiagent systems, as it directly addresses self-motivated agent incentives for participation and for truthfulness, and can provide a decision-theoretic foundation for the "expert weights" often employed in centralized pooling techniques.

UAI Conference 1996 Conference Paper

Optimal Factory Scheduling using Stochastic Dominance A*

  • Peter R. Wurman
  • Michael P. Wellman

We examine a standard factory scheduling problem with stochastic processing and setup times, minimizing the expectation of the weighted number of tardy jobs. Because the costs of operators in the schedule are stochastic and sequence dependent, standard dynamic programming algorithms such as A* may fail to find the optimal schedule. The SDA* (Stochastic Dominance A*) algorithm remedies this difficulty by relaxing the pruning condition. We present an improved state-space search formulation for these problems and discuss the conditions under which stochastic scheduling problems can be solved optimally using SDA*. In empirical testing on randomly generated problems, we found that in 70%, the expected cost of the optimal stochastic solution is lower than that of the solution derived using a deterministic approximation, with comparable search effort.

UAI Conference 1996 Conference Paper

Toward a Market Model for Bayesian Inference

  • David M. Pennock
  • Michael P. Wellman

We present a methodology for representing probabilistic relationships in a general-equilibrium economic model. Specifically, we define a precise mapping from a Bayesian network with binary nodes to a market price system where consumers and producers trade in uncertain propositions. We demonstrate the correspondence between the equilibrium prices of goods in this economy and the probabilities represented by the Bayesian network. A computational market model such as this may provide a useful framework for investigations of belief aggregation, distributed probabilistic inference, resource allocation under uncertainty, and other problems of decentralized uncertainty.

UAI Conference 1995 Conference Paper

Accounting for Context in Plan Recognition, with Application to Traffic Monitoring

  • David V. Pynadath
  • Michael P. Wellman

Typical approaches to plan recognition start from a representation of an agent's possible plans, and reason evidentially from observations of the agent's actions to assess the plausibility of the various candidates. A more expansive view of the task (consistent with some prior work) accounts for the context in which the plan was generated, the mental state and planning process of the agent, and consequences of the agent's actions in the world. We present a general Bayesian framework encompassing this view, and focus on how context can be exploited in plan recognition. We demonstrate the approach on a problem in traffic monitoring, where the objective is to induce the plan of the driver from observation of vehicle movements. Starting from a model of how the driver generates plans, we show how the highway context can appropriately influence the recognizer's interpretation of observed driver behavior.

UAI Conference 1995 Conference Paper

Path Planning under Time-Dependent Uncertainty

  • Michael P. Wellman
  • Matthew Ford
  • Kenneth Larson

Standard algorithms for finding the shortest path in a graph require that the cost of a path be additive in edge costs, and typically assume that costs are deterministic. We consider the problem of uncertain edge costs, with potential probabilistic dependencies among the costs. Although these dependencies violate the standard dynamic-programming decomposition, we identify a weaker stochastic consistency condition that justifies a generalized dynamic-programming approach based on stochastic dominance. We present a revised path-planning algorithm and prove that it produces optimal paths under time-dependent uncertain costs. We test the algorithm by applying it to a model of stochastic bus networks, and present empirical performance results comparing it to some alternatives. Finally, we consider extensions of these concepts to a more general class of problems of heuristic search under uncertainty.

AAAI Conference 1994 Conference Paper

A Computational Market Model for Distributed Configuration Design

  • Michael P. Wellman

This paper1 presents a precise market model for a well-defined class of distributed configuration design problems. Given a design problem, the model defines a computational economy to allocate basic resources to agents participating in the design. The result of running these “design economies” constitutes the market solution to the original problem. After defining the configuration design framework, I describe the mapping to computational economies and our results to date. For some simple examples, the system can produce good designs relatively quickly. However, analysis shows that the design economies are not guaranteed to find optimal designs, and we identify and discuss some of the major pitfalls. Despite known shortcomings and limited explorations thus far, the market model offers a useful conceptual viewpoint for analyzing distributed design problems.

UAI Conference 1994 Conference Paper

State-Space Abstraction for Anytime Evaluation of Probabilistic Networks

  • Michael P. Wellman
  • Chao-Lin Liu

One important factor determining the computational complexity of evaluating a probabilistic network is the cardinality of the state spaces of the nodes. By varying the granularity of the state spaces, one can trade off accuracy in the result for computational efficiency. We present an anytime procedure for approximate evaluation of probabilistic networks based on this idea. On application to some simple networks, the procedure exhibits a smooth improvement in approximation quality as computation time increases. This suggests that state-space abstraction is one more useful control parameter for designing real-time probabilistic reasoners.

UAI Conference 1994 Conference Paper

The Automated Mapping of Plans for Plan Recognition

  • Marcus J. Huber
  • Edmund H. Durfee
  • Michael P. Wellman

To coordinate with other agents in its environment, an agent needs models of what the other agents are trying to do. When communication is impossible or expensive, this information must be acquired indirectly via plan recognition. Typical approaches to plan recognition start with a specification of the possible plans the other agents may be following, and develop special techniques for discriminating among the possibilities. Perhaps more desirable would be a uniform procedure for mapping plans to general structures supporting inference based on uncertain and incomplete observations. In this paper, we describe a set of methods for converting plans represented in a flexible procedural language to observation models represented as probabilistic belief networks.

AAAI Conference 1992 Conference Paper

A General-Equilibrium Approach to Distributed Transportation Planning

  • Michael P. Wellman

Market price mechanisms from economics constitute a well-understood framework for coordinating decentralized decision processes with minimal communication. WALRAS is a general "market-oriented programming" environment for the construction and analysis of distributed planning systems, based on general-equilibrium theory. The environment provides basic constructs for defining computational market structures, and a procedure for deriving their corresponding competitive equilibria. In a particular realization of this approach for a simplified form of distributed transportation planning, we see that careful construction of the decision process according to economic principles can lead to effective decentralization, and that the behavior of the system can be meaningfully analyzed in economic terms.

KER Journal 1992 Journal Article

From knowledge bases to decision models

  • Michael P. Wellman
  • John S. Breese
  • Robert P. Goldman

Abstract In recent years there has been a growing interest among AI researchers in probabilistic and decision modelling, spurred by significant advances in representation and computation with network modelling formalisms. In applying these techniques to decision support tasks, fixed network models have proven to be inadequately expressive when a broad range of situations must be handled. Hence many researchers have sought to combine the strengths of flexible knowledge representation languages with the normative status and well-understood computational properties of decision-modelling formalisms and algorithms. One approach is to encode general knowledge in an expressive language, then dynamically construct a decision model for each particular situation or problem instance. We have developed several systems adopting this approach, which illustrate a variety of interesting techniques and design issues.

AIJ Journal 1991 Journal Article

Impediments to universal preference-based default theories

  • Jon Doyle
  • Michael P. Wellman

Research on nonmonotonic and default reasoning has identified several important criteria for preferring alternative default inferences. The theories of reasoning based on each of these criteria may uniformly be viewed as theories of rational inference, in which the reasoner selects maximally preferred states of belief. Though researchers have noted some cases of apparent conflict between the preferences supported by different theories, it has been hoped that these special theories of reasoning may be combined into a universal logic of nonmonotonic reasoning. We show that the different categories of preferences conflict more than has been realized, and adapt formal results from social choice theory to prove that every universal theory of default reasoning will violate at least one reasonable principle of rational reasoning. Our results can be interpreted as demonstrating that, within the preferential framework, we cannot expect much improvement on the rigid lexicographic priority mechanisms that have been proposed for conflict resolution.

UAI Conference 1990 Conference Paper

Exploiting functional dependencies in qualitative probabilistic reasoning

  • Michael P. Wellman

Functional dependencies restrict the potential interactions among variables connected in a probabilistic network. This restriction can be exploited in qualitative probabilistic reasoning by introducing deterministic variables and modifying the inference rules to produce stronger conclusions in the presence of functional relations. I describe how to accomplish these modifications in qualitative probabilistic networks by exhibiting the update procedures for graphical transformations involving probabilistic and deterministic variables and combinations. A simple example demonstrates that the augmented scheme can reduce qualitative ambiguity that would arise without the special treatment of functional dependency. Analysis of qualitative synergy reveals that new higher-order relations are required to reason effectively about synergistic interactions among deterministic variables.

AIJ Journal 1990 Journal Article

Fundamental concepts of qualitative probabilistic networks

  • Michael P. Wellman

Graphical representations for probabilistic relationships have recently received considerable attention in AI. Qualitative probabilistic networks abstract from the usual numeric representations by encoding only qualitative relationships, which are inequality constraints on the joint probability distribution over the variables. Although these constraints are insufficient to determine probabilities uniquely, they are designed to justify the deduction of a class of relative likelihood conclusions that imply useful decision-making properties. Two types of qualitative relationship are defined, each a probabilistic form of monotonicity constraint over a group of variables. Qualitative influences describe the direction of the relationship between two variables. Qualitative synergies describe interactions among influences. The probabilistic definitions chosen justify sound and efficient inference procedures based on graphical manipulations of the network. These procedures answer queries about qualitative relationships among variables separated in the network and determine structural properties of optimal assignments to decision variables.

AAAI Conference 1988 Conference Paper

Mechanisms for Reasoning about Sets

  • Michael P. Wellman

The SEt Reasoning Facility (SERF) integrates mechanisms for propagating membership propositions, deriving relations between sets, and reasoning about closure and cardinality into an efficient utility package for reasoning about sets. Assertions about relations between sets are compiled into a constraint network defined entirely in terms of union, complement, and emptiness constraints. The constraint network supports multiple modes of inference, such as local propagation of membership propositions and graph search for set relations using a transitivity table. SERFpermits closure assertions of the form “all members of set S are known” and utilizes this capability to permit selective applications of closed-world assumptions. Cardinality constraints are handled by a general quantity reasoner. An example from geologic interpretation illustrates the value of mutually constraining sources of information in a typical application of reasoning about sets in commonsense problem-solving.

IJCAI Conference 1987 Conference Paper

Dominance and Subsumption in Constraint-Posting Planning

  • Michael P. Wellman

By integrating a dominance prover into the plan search process, the traditional constraint-posting planning paradigm can be generalised to permit partially satisfiable goals. In this approach, the view of planning as theorem proving is retained, although the emphasis is on proving theorems about dominance relations on classes of candidate plans. Plan classes are generated by posting constraints at various levels of abstraction, then classified within a plan lattice that manages inheritance of properties and dominance characteristics. An analysis of TWEAK demonstrates how to recast existing planning ideas in this framework, providing insight into the planner's capabilities with a dominance-proving interpretation. The plan lattice representation highlights the role of plan subsumption in optimizing search.

AAAI Conference 1987 Conference Paper

Probabilistic Semantics for Qualitative Influences

  • Michael P. Wellman

What’s in an infiuence link? To answer this foundational question, I propose a semantics for qualitative influences: a positively influences b if and only if the posterior distribution for b given o increases with o in the sense of first-order stochastic dominance. By requiring that this condition hold in all contexts, we gain the ability to perform inference across chains of qualitative influences. Under sets of basic desiderata, the proposed definition is necessary as well as sufficient for this desirable computational property.