Arrow Research search

Author name cluster

Neil Burch

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers
2 author rows

Possible papers

30

IJCAI Conference 2024 Conference Paper

Look-ahead Search on Top of Policy Networks in Imperfect Information Games

  • Ondřej Kubíček
  • Neil Burch
  • Viliam Lisý

Search in test time is often used to improve the performance of reinforcement learning algorithms. Performing theoretically sound search in fully adversarial two-player games with imperfect information is notoriously difficult and requires a complicated training process. We present a method for adding test-time search to an arbitrary policy-gradient algorithm that learns from sampled trajectories. Besides the policy network, the algorithm trains an additional critic network, which estimates the expected values of players following various transformations of the policies given by the policy network. These values are then used for depth-limited search. We show how the values from this critic can create a value function for imperfect information games. Moreover, they can be used to compute the summary statistics necessary to start the search from an arbitrary decision point in the game. The presented algorithm is scalable to very large games since it does not require any search during train time. We evaluate the algorithm's performance when trained along Regularized Nash Dynamics, and we evaluate the benefit of using the search in the standard benchmark game of Leduc hold'em, multiple variants of imperfect information Goofspiel, and Battleships.

TMLR Journal 2023 Journal Article

Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

  • Marc Lanctot
  • John Schultz
  • Neil Burch
  • Max Olan Smith
  • Daniel Hennes
  • Thomas Anthony
  • Julien Perolat

Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors along with a population of forty-three tournament entries, some of which are intentionally sub-optimal. We describe metrics to measure the quality of agents based both on average returns and exploitability. We then show that several RL, online learning, and language model approaches can learn good counter-strategies and generalize well, but ultimately lose to the top-performing bots, creating an opportunity for research in multiagent learning.

IJCAI Conference 2023 Conference Paper

Rethinking Formal Models of Partially Observable Multiagent Decision Making (Extended Abstract)

  • Vojtěch Kovařík
  • Martin Schmid
  • Neil Burch
  • Michael Bowling
  • Viliam Lisý

Multiagent decision-making in partially observable environments is usually modelled as either an extensive-form game (EFG) in game theory or a partially observable stochastic game (POSG) in multiagent reinforcement learning (MARL). One issue with the current situation is that while most practical problems can be modelled in both formalisms, the relationship of the two models is unclear, which hinders the transfer of ideas between the two communities. A second issue is that while EFGs have recently seen significant algorithmic progress, their classical formalization is unsuitable for efficient presentation of the underlying ideas, such as those around decomposition. To solve the first issue, we introduce factored-observation stochastic games (FOSGs), a minor modification of the POSG formalism which distinguishes between private and public observation and thereby greatly simplifies decomposition. To remedy the second issue, we show that FOSGs and POSGs are naturally connected to EFGs: by "unrolling" a FOSG into its tree form, we obtain an EFG. Conversely, any perfect-recall timeable EFG corresponds to some underlying FOSG in this manner. Moreover, this relationship justifies several minor modifications to the classical EFG formalization that recently appeared as an implicit response to the model's issues with decomposition. Finally, we illustrate the transfer of ideas between EFGs and MARL by presenting three key EFG techniques -- counterfactual regret minimization, sequence form, and decomposition -- in the FOSG framework.

IJCAI Conference 2022 Conference Paper

Approximate Exploitability: Learning a Best Response

  • Finbarr Timbers
  • Nolan Bard
  • Edward Lockhart
  • Marc Lanctot
  • Martin Schmid
  • Neil Burch
  • Julian Schrittwieser
  • Thomas Hubert

Researchers have shown that neural networks are vulnerable to adversarial examples and subtle environment changes. The resulting errors can look like blunders to humans, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. Such evaluation typically fails to evaluate robustness to worst-case outcomes. Computer poker research has examined how to assess such worst-case performance. Unfortunately, exact computation is infeasible with larger domains, and existing approximations are poker-specific. We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent, approximating worst-case performance. We demonstrate the technique in several games against a variety of agents, including several AlphaZero-based agents. Supplementary material is available at https: //arxiv. org/abs/2004. 09677.

ICML Conference 2021 Conference Paper

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

  • Julien Pérolat
  • Rémi Munos
  • Jean-Baptiste Lespiau
  • Shayegan Omidshafiei
  • Mark Rowland 0001
  • Pedro A. Ortega
  • Neil Burch
  • Thomas W. Anthony 0001

In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincar{é} recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergence guarantees in monotone games. We continue by showing how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium. Finally, we show how these insights can be directly used to build state-of-the-art model-free algorithms for zero-sum two-player Imperfect Information Games (IIG).

AAAI Conference 2021 Conference Paper

Solving Common-Payoff Games with Approximate Policy Iteration

  • Samuel Sokota
  • Edward Lockhart
  • Finbarr Timbers
  • Elnaz Davoodi
  • Ryan D'Orazio
  • Neil Burch
  • Martin Schmid
  • Michael Bowling

For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult—computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight—that a team of agents can coordinate via common knowledge—has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian action decoder (BAD) leverages this insight and deep reinforcement learning to scale to games as large as two-player Hanabi. However, the approximations it uses to do so prevent it from discovering optimal joint policies even in games small enough to brute force optimal solutions. This work proposes CAPI, a novel algorithm which, like BAD, combines common knowledge with deep reinforcement learning. However, unlike BAD, CAPI prioritizes the propensity to discover optimal joint policies over scalability. While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.

AAMAS Conference 2021 Conference Paper

Sound Algorithms in Imperfect Information Games

  • Michal Šustr
  • Martin Schmid
  • Matej Moravćík
  • Neil Burch
  • Marc Lanctot
  • Michael Bowling

Search has played a fundamental role in computer game research since the very beginning. And while online search has been commonly used in perfect information games such as Chess and Go, online search methods for imperfect information games have only been introduced relatively recently. This paper addresses the question of what is a sound online algorithm in an imperfect information setting of two-player zero-sum games? We argue that the fixedstrategy definitions of exploitability and epsilon-Nash equilibria are ill suited to measure the worst-case performance of an online algorithm. We thus formalize epsilon-soundness, a concept that connects the worst-case performance of an online algorithm to the performance of an epsilon-Nash equilibrium. Our definition of soundness and the consistency hierarchy finally provide appropriate tools to analyze online algorithms in repeated imperfect information games. We thus inspect some of the previous online algorithms in a new light, bringing new insights into their worst case performance guarantees.

ICML Conference 2019 Conference Paper

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

  • Jakob N. Foerster
  • H. Francis Song
  • Edward Hughes 0001
  • Neil Burch
  • Iain Dunning
  • Shimon Whiteson
  • Matthew M. Botvinick
  • Michael H. Bowling

When observing the actions of others, humans make inferences about why they acted as they did, and what this implies about the world; humans also use the fact that their actions will be interpreted in this manner, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player, zero-sum games, scalable multi-agent reinforcement learning algorithms that can discover effective strategies and conventions in complex, partially observable settings have proven elusive. We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment. BAD introduces a new Markov decision process, the public belief MDP, in which the action space consists of all deterministic partial policies, and exploits the fact that an agent acting only on this public belief state can still learn to use its private information if the action space is augmented to be over all partial policies mapping private information into environment actions. The Bayesian update is closely related to the theory of mind reasoning that humans carry out when observing others’ actions. We first validate BAD on a proof-of-principle two-step matrix game, where it outperforms policy gradient methods; we then evaluate BAD on the challenging, cooperative partial-information card game Hanabi, where, in the two-player setting, it surpasses all previously published learning and hand-coded approaches, establishing a new state of the art.

JAIR Journal 2019 Journal Article

Revisiting CFR+ and Alternating Updates

  • Neil Burch
  • Matej Moravcik
  • Martin Schmid

The CFR+ algorithm for solving imperfect information games is a variant of the popular CFR algorithm, with faster empirical performance on a range of problems. It was introduced with a theoretical upper bound on solution error, but subsequent work showed an error in one step of the proof. We provide updated proofs to recover the original bound.

AAAI Conference 2019 Conference Paper

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games Using Baselines

  • Martin Schmid
  • Neil Burch
  • Marc Lanctot
  • Matej Moravcik
  • Rudolf Kadlec
  • Michael Bowling

Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, periteration estimated values and updates are reformulated as a function of sampled values and state-action baselines, similar to their use in policy gradient reinforcement learning. The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates. Finally, we show that given a perfect baseline, the variance of the value estimates can be reduced to zero. Experimental evaluation shows that VR-MCCFR brings an order of magnitude speedup, while the empirical variance decreases by three orders of magnitude. The decreased variance allows for the first time CFR+ to be used with sampling, increasing the speedup to two orders of magnitude.

AAAI Conference 2018 Conference Paper

AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games

  • Neil Burch
  • Martin Schmid
  • Matej Moravcik
  • Dustin Morill
  • Michael Bowling

Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available. The variance of sampled outcomes may make the simple approach of Monte Carlo sampling inadequate. This is the case for agents playing heads-up no-limit Texas hold’em poker, where manmachine competitions typically involve multiple days of consistent play by multiple players, but still can (and sometimes did) result in statistically insignificant conclusions. In this paper, we introduce AIVAT, a low variance, provably unbiased value assessment tool that exploits an arbitrary heuristic estimate of state value, as well as the explicit strategy of a subset of the agents. Unlike existing techniques which reduce the variance from chance events, or only consider game ending actions, AIVAT reduces the variance both from choices by nature and by players with a known strategy. The resulting estimator produces results that significantly outperform previous state of the art techniques. It was able to reduce the standard deviation of a Texas hold’em poker man-machine match by 85% and consequently requires 44 times fewer games to draw the same statistical conclusion. AIVAT enabled the first statistically significant AI victory against professional poker players in no-limit hold’em. Furthermore, the technique was powerful enough to produce statistically significant results versus individual players, not just an aggregate pool of the players. We also used AIVAT to analyze a short series of AI vs human poker tournaments, producing statistical significant results with as few as 28 matches.

IJCAI Conference 2015 Conference Paper

Solving Heads-Up Limit Texas Hold'em

  • Oskari Tammelin
  • Neil Burch
  • Michael Johanson
  • Michael Bowling

Cepheus is the first computer program to essentially solve a game of imperfect information that is played competitively by humans. The game it plays is heads-up limit Texas hold’em poker, a game with over 1014 information sets, and a challenge problem for artificial intelligence for over 10 years. Cepheus was trained using a new variant of Counterfactual Regret Minimization (CFR), called CFR+, using 4800 CPUs running for 68 days. In this paper we describe in detail the engineering details required to make this computation a reality. We also prove the theoretical soundness of CFR+ and its component algorithm, regretmatching+. We further give a hint towards understanding the success of CFR+ by proving a tracking regret bound for this new regret matching algorithm. We present results showing the role of the algorithmic components and the engineering choices to the success of CFR+.

AAAI Conference 2014 Conference Paper

Solving Imperfect Information Games Using Decomposition

  • Neil Burch
  • Michael Johanson
  • Michael Bowling

Decomposition, i. e. , independently analyzing possible subgames, has proven to be an essential principle for effective decision-making in perfect information games. However, in imperfect information games, decomposition has proven to be problematic. To date, all proposed techniques for decomposition in imperfect information games have abandoned theoretical guarantees. This work presents the first technique for decomposing an imperfect information game into subgames that can be solved independently, while retaining optimality guarantees on the full-game solution. We can use this technique to construct theoretically justified algorithms that make better use of information available at run-time, overcome memory or disk limitations at run-time, or make a time/space trade-off to overcome memory or disk limitations while solving a game. In particular, we present an algorithm for subgame solving which guarantees performance in the whole game, in contrast to existing methods which may have unbounded error. In addition, we present an offline game solving algorithm, CFR-D, which can produce a Nash equilibrium for a game that is larger than available storage.

AAAI Conference 2014 Conference Paper

Using Response Functions to Measure Strategy Strength

  • Trevor Davis
  • Neil Burch
  • Michael Bowling

Extensive-form games are a powerful tool for representing complex multi-agent interactions. Nash equilibrium strategies are commonly used as a solution concept for extensive-form games, but many games are too large for the computation of Nash equilibria to be tractable. In these large games, exploitability has traditionally been used to measure deviation from Nash equilibrium, and thus strategies are aimed to achieve minimal exploitability. However, while exploitability measures a strategy’s worst-case performance, it fails to capture how likely that worst-case is to be observed in practice. In fact, empirical evidence has shown that a less exploitable strategy can perform worse than a more exploitable strategy in one-on-one play against a variety of opponents. In this work, we propose a class of response functions that can be used to measure the strength of a strategy. We prove that standard no-regret algorithms can be used to learn optimal strategies for a scenario where the opponent uses one of these response functions. We demonstrate the effectiveness of this technique in Leduc Hold’em against opponents that use the UCT Monte Carlo tree search algorithm.

AAMAS Conference 2013 Conference Paper

Rating Players in Games with Real-Valued Outcomes

  • Christopher Archibald
  • Neil Burch
  • Michael Bowling
  • Matthew Rutherford

Game-theoretic models typically associate outcomes with real valued utilities, and rational agents are expected to maximize their expected utility. Currently fielded agent rating systems, which aim to order a population of agents by strength, focus exclusively on games with discrete outcomes, e. g. , win-loss in two-agent settings or an ordering in the multi-agent setting. These rating systems are not well-suited for domains where the absolute magnitude of utility rather than just the relative value is important. We introduce the problem of rating agents in games with real-valued outcomes and survey applicable existing techniques for rating agents in this setting. We then propose a novel rating system and an extension for all of these rating systems to games with more than two agents, showing experimentally the advantages of our proposed system.

SoCS Conference 2012 Conference Paper

Automatic Move Pruning Revisited

  • Neil Burch
  • Robert C. Holte

In this paper we show that the move pruning method we presented at SoCS last year sometimes prunes all the least-cost paths from one state to another. We present two examples exhibiting this erroneous behaviour--a simple, artificial example and a slightly more complex example that arose in last year

NeurIPS Conference 2012 Conference Paper

Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

  • Neil Burch
  • Marc Lanctot
  • Duane Szafron
  • Richard Gibson

Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants. Finally, we validate our work by demonstrating that AS converges faster than previous MCCFR algorithms in both no-limit poker and Bluff.

AAAI Conference 2012 Conference Paper

Finding Optimal Abstract Strategies in Extensive-Form Games

  • Michael Johanson
  • Nolan Bard
  • Neil Burch
  • Michael Bowling

Extensive-form games are a powerful model for representing interactions between agents. Nash equilibrium strategies are a common solution concept for extensive-form games and, in two-player zero-sum games, there are efficient algorithms for calculating such strategies. In large games, this computation may require too much memory and time to be tractable. A standard approach in such cases is to apply a lossy state-space abstraction technique to produce a smaller abstract game that can be tractably solved, while hoping that the resulting abstract game equilibrium is close to an equilibrium strategy in the unabstracted game. Recent work has shown that this assumption is unreliable, and an arbitrary Nash equilibrium in the abstract game is unlikely to be even near the least suboptimal strategy that can be represented in that space. In this work, we present for the first time an algorithm which ef- ficiently finds optimal abstract strategies — strategies with minimal exploitability in the unabstracted game. We use this technique to find the least exploitable strategy ever reported for two-player limit Texas hold’em.

AAAI Conference 2012 Conference Paper

Generalized Sampling and Variance in Counterfactual Regret Minimization

  • Richard Gibson
  • Marc Lanctot
  • Neil Burch
  • Duane Szafron
  • Michael Bowling

In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. While the base algorithm performs a full tree traversal on each iteration, Monte Carlo CFR (MCCFR) reduces the per iteration time cost by traversing just a sampled portion of the tree. On the other hand, MCCFR’s sampled values introduce variance, and the effects of this variance were previously unknown. In this paper, we generalize MCCFR by considering any generic estimator of the sought values. We show that any choice of an estimator can be used to probabilistically minimize regret, provided the estimator is bounded and unbiased. In addition, we relate the variance of the estimator to the convergence rate of an algorithm that calculates regret directly from the estimator. We demonstrate the application of our analysis by defining a new bounded, unbiased estimator with empirically lower variance than MCCFR estimates. Finally, we use this estimator in a new sampling algorithm to compute approximate equilibria in Goofspiel, Bluff, and Texas hold’em poker. Under each of our selected sampling schemes, our new algorithm converges faster than MCCFR.

SoCS Conference 2011 Conference Paper

Abstract: Block A* and Any-Angle Path-Planning

  • Peter Kai Yue Yap
  • Neil Burch
  • Robert C. Holte
  • Jonathan Schaeffer 0001

We present three new ideas for grid-based path-planning algorithms that improve the search speed and quality of the paths found. First, we introduce a new type of database, the Local Distance Database (LDDB), that contains distances between boundary points of a local neighborhood. Second, an LDDB-based algorithm is introduced, called Block A*, that calculates the optimal path between start and goal locations given the local distances stored in the LDDB. Third, our experimental results for any-angle path planning in a wide varietyof test domains, including real game maps, show that Block A* is faster than both A* and the previously best grid-based any-angle search algorithm, Theta*.

SoCS Conference 2011 Conference Paper

Automatic Move Pruning in General Single-Player Games

  • Neil Burch
  • Robert C. Holte

Move pruning is a low-overhead technique for reducing the size of a depth first search tree. The existing algorithm for automatically discovering move pruning information is restricted to games where all moves can be applied to every state. This paper demonstrates an algorithm which handles a general class of single player games. It gives experimental results for our technique, demonstrating both the applicability to a range of games, and the reduction in search tree size. We also provide some conditions under which move pruning is safe, and when it may interfere with other search reduction techniques.

AAAI Conference 2011 Conference Paper

Block A*: Database-Driven Search with Applications in Any-Angle Path-Planning

  • Peter Yap
  • Neil Burch
  • Robert Holte
  • Jonathan Schaeffer

We present three new ideas for grid-based path-planning algorithms that improve the search speed and quality of the paths found. First, we introduce a new type of database, the Local Distance Database (LDDB), that contains distances between boundary points of a local neighborhood. Second, an LDDB-based algorithm is introduced, called Block A*, that calculates the optimal path between start and goal locations given the local distances stored in the LDDB. Third, our experimental results for any-angle path planning in a wide variety of test domains, including real game maps, show that Block A* is faster than both A* and the previously best gridbased any-angle search algorithm, Theta*.

ECAI Conference 2010 Conference Paper

Automating Layouts of Sewers in Subdivisions

  • Neil Burch
  • Robert C. Holte
  • Martin Müller 0003
  • David O'Connell
  • Jonathan Schaeffer 0001

An important part of the creation of a housing subdivision is the design and layout of sewers underneath the road. This is a challenging cost optimization problem in a continuous threedimensional space. In this paper, heuristic-search-based techniques are proposed for tackling this problem. The result is new algorithms that can quickly find near optimal solutions that offer important reductions in the cost of design and construction.

IJCAI Conference 2009 Conference Paper

  • Nathan R. Sturtevant
  • Ariel Felner
  • Max Barrer
  • Jonathan Schaeffer
  • Neil Burch

AAAI Conference 2008 Conference Paper

Predicting the Performance of IDA* with Conditional Distributions

  • Uzi Zahavi
  • Neil Burch

(Korf, Reid, and Edelkamp 2001) introduced a formula to predict the number of nodes IDA* will expand given the static distribution of heuristic values. Their formula proved to be very accurate but it is only accurate under the following limitations: (1) the heuristic must be consistent; (2) the prediction is for a large random sample of start states (or for large thresholds). In this paper we generalize the static distribution to a conditional distribution of heuristic values. We then propose a new formula for predicting the performance of IDA* that works well for inconsistent heuristics (Zahavi et al. 2007) and for any set of start states, not just a random sample. We also show how the formula can be enhanced to work well for single start states. Experimental results demonstrate the accuracy of our method in all these situations.

UAI Conference 2005 Conference Paper

Bayes? Bluff: Opponent Modelling in Poker

  • Finnegan Southey
  • Michael H. Bowling
  • Bryce Larson
  • Carmelo Piccione
  • Neil Burch
  • Darse Billings
  • D. Chris Rayner

Poker is a challenging problem for artificial intelligence, with non-deterministic dynamics, partial observability, and the added difficulty of unknown adversaries. Modelling all of the uncertainties in this domain is not an easy task. In this paper we present a Bayesian probabilistic model for a broad class of poker games, separating the uncertainty in the game dynamics from the uncertainty of the opponent's strategy. We then describe approaches to two key subproblems: (i) inferring a posterior over opponent strategies given a prior distribution and observations of their play, and (ii) playing an appropriate response to that distribution. We demonstrate the overall approach on a reduced version of poker using Dirichlet priors and then on the full game of Texas hold'em using a more informed prior. We demonstrate methods for playing effective responses to the opponent, based on the posterior.