Arrow Research search

Author name cluster

James R. Wright

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

AAAI Conference 2026 Conference Paper

ElementaryNet: A Non-Strategic Neural Network for Predicting Human Behavior in Normal-Form Games

  • Greg d'Eon
  • Hala Murad
  • Kevin Leyton-Brown
  • James R. Wright

Behavioral game theory models serve two purposes: yielding insights into how human decision-making works, and predicting how people would behave in novel strategic settings. A system called GameNet represents the state of the art for predicting human behavior in the setting of unrepeated simultaneous-move games, combining a simple "level-k" model of strategic reasoning with a complex neural network model of non-strategic "level-0" behavior. Although this reliance on well-established ideas from cognitive science ought to make GameNet interpretable, the flexibility of its level-0 model raises the possibility that it is able to emulate strategic reasoning. In this work, we prove that GameNet's level-0 model is indeed too general. We then introduce ElementaryNet, a novel neural network that is provably incapable of expressing strategic behavior. We show that these additional restrictions are empirically harmless, with ElementaryNet and GameNet having statistically indistinguishable performance. We then show how it is possible to derive insights about human behavior by varying ElementaryNet's features and interpreting its parameters, finding evidence of iterative reasoning, learning about the depth of this reasoning process, and showing the value of a rich level-0 specification.

AAAI Conference 2024 Conference Paper

Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning (Abstract Reprint)

  • Vincent Liu
  • James R. Wright
  • Mrtha White

Offline reinforcement learning—learning a policy from a batch of data—is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) and have limited impact on the remaining part of the state (an exogenous component). AIR is a strong assumption, but it nonetheless holds in a number of real-world domains including financial markets. We discuss algorithms that exploit the AIR property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments where the regularity holds.

AAAI Conference 2024 Conference Paper

How to Evaluate Behavioral Models

  • Greg d'Eon
  • Sophie Greenwood
  • Kevin Leyton-Brown
  • James R. Wright

Researchers building behavioral models, such as behavioral game theorists, use experimental data to evaluate predictive models of human behavior. However, there is little agreement about which loss function should be used in evaluations, with error rate, negative log-likelihood, cross-entropy, Brier score, and squared L2 error all being common choices. We attempt to offer a principled answer to the question of which loss functions should be used for this task, formalizing axioms that we argue loss functions should satisfy. We construct a family of loss functions, which we dub ``diagonal bounded Bregman divergences'', that satisfy all of these axioms. These rule out many loss functions used in practice, but notably include squared L2 error; we thus recommend its use for evaluating behavioral models.

JAIR Journal 2023 Journal Article

Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning

  • Vincent Liu
  • James R. Wright
  • Martha White

Offline reinforcement learning—learning a policy from a batch of data—is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) and have limited impact on the remaining part of the state (an exogenous component). AIR is a strong assumption, but it nonetheless holds in a number of real-world domains including financial markets. We discuss algorithms that exploit the AIR property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments where the regularity holds.

AAMAS Conference 2023 Conference Paper

Non-strategic Econometrics (for Initial Play)

  • Daniel Chui
  • Jason Hartline
  • James R. Wright

Modelling agent preferences has applications in a range of fields including economics and increasingly, artificial intelligence. These preferences are not always known and thus may need to be estimated from observed behavior, in which case a model is required to map agent preferences to behavior, also known as structural estimation. Traditional models are based on the assumption that agents are perfectly rational: that is, they perfectly optimize and behave in accordance with their own interests. Work in the field of behavioral game theory has shown, however, that human agents often make decisions that are imperfectly rational, and the field has developed models that relax the perfect rationality assumption. We apply models developed for predicting behavior towards estimating preferences and show that they outperform both traditional and commonly used benchmark models on data collected from human subjects. In fact, Nash equilibrium and its relaxation, quantal response equilibrium (QRE), can induce an inaccurate estimate of agent preferences when compared against ground truth. A key finding is that modelling non-strategic behavior, conventionally considered uniform noise, is important for estimating preferences. To this end, we introduce quantal-linear4, a rich nonstrategic model. We also propose an augmentation to the popular quantal response equilibrium with a non-strategic component. We call this augmented model QRE+L0 and find an improvement in estimating values over the standard QRE. QRE+L0 allows for alternative models of non-strategic behavior in addition to quantal-linear4.

ICML Conference 2021 Conference Paper

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

  • Dustin Morrill
  • Ryan D'Orazio
  • Marc Lanctot
  • James R. Wright
  • Michael H. Bowling
  • Amy Greenwald

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games. Integrating the idea of time selection into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set. We identify behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths. In addition, we present a thorough empirical analysis of EFR instantiated with different deviation types in benchmark games, where we find that stronger types typically induce better performance.

AIJ Journal 2019 Journal Article

Incentivizing evaluation with peer prediction and limited access to ground truth

  • Xi Alice Gao
  • James R. Wright
  • Kevin Leyton-Brown

In many settings, an effective way of evaluating objects of interest is to collect evaluations from dispersed individuals and to aggregate these evaluations together. Some examples are categorizing online content and evaluating student assignments via peer grading. For this data science problem, one challenge is to motivate participants to conduct such evaluations carefully and to report them honestly, particularly when doing so is costly. Existing approaches, notably peer-prediction mechanisms, can incentivize truth telling in equilibrium. However, they also give rise to equilibria in which agents do not pay the costs required to evaluate accurately, and hence fail to elicit useful information. We show that this problem is unavoidable whenever agents are able to coordinate using low-cost signals about the items being evaluated (e. g. , text labels or pictures). We then consider ways of circumventing this problem by comparing agents' reports to ground truth, which is available in practice when there exist trusted evaluators—such as teaching assistants in the peer grading scenario—who can perform a limited number of unbiased (but noisy) evaluations. Of course, when such ground truth is available, a simpler approach is also possible: rewarding each agent based on agreement with ground truth with some probability, and unconditionally rewarding the agent otherwise. Surprisingly, we show that the simpler mechanism achieves stronger incentive guarantees given less access to ground truth than a large set of peer-prediction mechanisms.

JAIR Journal 2019 Journal Article

Level-0 Models for Predicting Human Behavior in Games

  • James R. Wright
  • Kevin Leyton-Brown

Behavioral game theory seeks to describe the way actual people (as compared to idealized, "rational" agents) act in strategic situations. Our own recent work has identified iterative models, such as quantal cognitive hierarchy, as the state of the art for predicting human play in unrepeated, simultaneous-move games. Iterative models predict that agents reason iteratively about their opponents, building up from a specification of nonstrategic behavior called level-0. A modeler is in principle free to choose any description of level-0 behavior that makes sense for a given setting. However, in practice almost all existing work specifies this behavior as a uniform distribution over actions. In most games it is not plausible that even nonstrategic agents would choose an action uniformly at random, nor that other agents would expect them to do so. A more accurate model for level-0 behavior has the potential to dramatically improve predictions of human behavior, since a substantial fraction of agents may play level-0 strategies directly, and furthermore since iterative models ground all higher-level strategies in responses to the level-0 strategy. Our work considers models of the way in which level-0 agents construct a probability distribution over actions, given an arbitrary game. We considered a large space of alternatives and, in the end, recommend a model that achieved excellent performance across the board: a linear weighting of four binary features, each of which is general in the sense that it can be computed from any normal form game. Adding real-valued variants of the same four features yielded further improvements in performance, albeit with a corresponding increase in the number of parameters needing to be estimated. We evaluated the effects of combining these new level-0 models with several iterative models and observed large improvements in predictive accuracy.