Arrow Research search

Author name cluster

Rahul Savani

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

27 papers
2 author rows

Possible papers

27

AAMAS Conference 2025 Conference Paper

From Natural Language to Extensive-Form Game Representations

  • Shilong Deng
  • Yongzhao Wang
  • Rahul Savani

We introduce a framework for translating game descriptions in natural language into game-theoretic extensive-form representations, leveraging Large Language Models (LLMs) and in-context learning. We find that a naive application of in-context learning struggles on this problem, in particular with imperfect information. To address this, we introduce GameInterpreter, a two-stage framework with specialized modules to enhance in-context learning, enabling it to divide and conquer the problem effectively. In the first stage, we tackle the challenge of imperfect information by developing a module that identifies information sets and the corresponding partial tree structure. With this information, the second stage leverages in-context learning alongside a self-debugging module to produce a complete extensive-form game tree represented using pygambit, the Python API of a recognized game-theoretic analysis tool called Gambit. Using this python representation enables the automation of tasks such as computing Nash equilibria directly from natural language descriptions. We evaluate the performance of the full framework, as well as its individual components, using various LLMs on games with different levels of strategic complexity. Our experimental results show that the framework significantly outperforms baseline approaches in generating accurate extensive-form games, with each module playing a critical role in its success.

NeurIPS Conference 2025 Conference Paper

MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures

  • Elena Zamaraeva
  • Christopher Collins
  • George Darling
  • Matthew S Dyer
  • Bei Peng
  • Rahul Savani
  • Dmytro Antypov
  • Vladimir Gusev

Geometry optimization of atomic structures is a common and crucial task in computational chemistry and materials design. Following the learning to optimize paradigm, we propose a new multi-agent reinforcement learning method called Multi-Agent Crystal Structure optimization (MACS) to address the problem of periodic crystal structure optimization. MACS treats geometry optimization as a partially observable Markov game in which atoms are agents that adjust their positions to collectively discover a stable configuration. We train MACS across various compositions of reported crystalline materials to obtain a policy that successfully optimizes structures from the training compositions as well as structures of larger sizes and unseen compositions, confirming its excellent scalability and zero-shot transferability. We benchmark our approach against a broad range of state-of-the-art optimization methods and demonstrate that MACS optimizes periodic crystal structures significantly faster, with fewer energy calculations, and the lowest failure rate.

IJCAI Conference 2024 Conference Paper

A Strategic Analysis of Prepayments in Financial Credit Networks

  • Hao Zhou
  • Yongzhao Wang
  • Konstantinos Varsos
  • Nicholas Bishop
  • Rahul Savani
  • Anisoara Calinescu
  • Michael Wooldridge

In financial credit networks, prepayments enable a firm to settle its debt obligations ahead of an agreed-upon due date. Prepayments have a transformative impact on the structure of networks, influencing the financial well-being (utility) of individual firms. This study investigates prepayments from both theoretical and empirical perspectives. We first establish the computational complexity of finding prepayments that maximize welfare, assuming global coordination among firms in the financial network. Subsequently, our focus shifts to understanding the strategic behavior of individual firms in the presence of prepayments. We introduce a prepayment game where firms strategically make prepayments, delineating the existence of pure strategy Nash equilibria and analyzing the price of anarchy (stability) within this game. Recognizing the computational challenges associated with determining Nash equilibria in prepayment games, we use a simulation-based approach, known as empirical game-theoretic analysis (EGTA). Through EGTA, we are able to find Nash equilibria among a carefully-chosen set of heuristic strategies. By scrutinizing the equilibrium behavior of firms, we outline the characteristics of high-performing strategies for strategic prepayments and establish connections between our empirical and theoretical findings.

IJCAI Conference 2024 Conference Paper

Policy Space Response Oracles: A Survey

  • Ariyan Bighashdel
  • Yongzhao Wang
  • Stephen McAleer
  • Rahul Savani
  • Frans A. Oliehoek

Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO), which holds promise to improve scalability by focusing attention on sufficient subsets of strategies. We first motivate PSRO and provide historical context. We then focus on the strategy exploration problem for PSRO: the challenge of assembling effective subsets of strategies that still represent the original game well with minimum computational cost. We survey current research directions for enhancing the efficiency of PSRO, and explore the applications of PSRO across various domains. We conclude by discussing open questions and future research.

JAIR Journal 2024 Journal Article

Selfishly Prepaying in Financial Credit Networks

  • Hao Zhou
  • Yongzhao Wang
  • Konstantinos Varsos
  • Nicholas Bishop
  • Rahul Savani
  • Anisoara Calinescu
  • Michael Wooldridge

In financial credit networks, prepayments enable a firm to settle its debt obligations ahead of an agreed-upon due date. Prepayments have a transformative impact on the structure of networks, influencing the financial well-being (utility) of individual firms. This study investigates prepayments from both theoretical and empirical perspectives. We first establish the computational complexity of finding prepayments that maximize welfare, assuming global coordination among firms in the financial network. Subsequently, our focus shifts to understanding the strategic behavior of individual firms in the presence of prepayments. We introduce a prepayment game where firms strategically make prepayments, delineating the existence of pure strategy Nash equilibria and analyzing the price of anarchy (stability) within this game. Recognizing the computational challenges associated with determining Nash equilibria in prepayment games, we use a simulation-based approach, known as empirical game-theoretic analysis (EGTA). Through EGTA, we are able to find Nash equilibria among a carefully-chosen set of heuristic strategies. By examining the equilibrium behavior of firms, we outline the characteristics of high-performing strategies for strategic prepayments and establish connections between our empirical and theoretical findings.

ICML Conference 2022 Conference Paper

Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

  • Nelson Vadori
  • Rahul Savani
  • Thomas Spooner
  • Sumitra Ganesh

Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria in games, where the update rule’s coefficients (learning rates) along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: the game signature. We construct the latter using a new decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We compare the performance of various update rules when their coefficients are learnt, and show that the RL policy is able to exploit the game signature across a wide range of game types. In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt.

AAMAS Conference 2022 Conference Paper

Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

  • Ian Gemp
  • Rahul Savani
  • Marc Lanctot
  • Yoram Bachrach
  • Thomas Anthony
  • Richard Everett
  • Andrea Tacchetti
  • Tom Eccles

Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously established homotopy that defines a continuum of equilibria for the game regularized with decaying levels of entropy. This continuum asymptotically approaches the limiting logit equilibrium, proven by McKelvey and Palfrey (1995) to be unique in almost all games, thereby partially circumventing the well-known equilibrium selection problem of many-player games. To encourage iterates to remain near this path, we efficiently minimize average deviation incentive via stochastic gradient descent, intelligently sampling entries in the payoff tensor as needed. Monte Carlo estimates of the stochastic gradient from joint play are biased due to the appearance of a nonlinear max operator in the objective, so we introduce additional innovations to the algorithm to alleviate gradient bias. The descent process can also be viewed as repeatedly constructing and reacting to a polymatrix approximation to the game. In these ways, our proposed approach, average deviation incentive descent with adaptive sampling (ADIDAS), is most similar to three classical approaches, namely homotopy-type, Lyapunov, and iterative polymatrix solvers. The lack of local convergence guarantees for biased gradient descent prevents guaranteed convergence to Nash, however, we demonstrate through extensive experiments the ability of this approach to approximate a unique Nash equilibrium in normal-form games with as many as seven players and twenty one actions (several billion outcomes) that are orders of magnitude larger than those possible with prior algorithms. Proc. of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022), P. Faliszewski, V. Mascardi, C. Pelachaud, M. E. Taylor (eds.), May 9–13, 2022, Online. © 2022 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved.

AAMAS Conference 2021 Conference Paper

Difference Rewards Policy Gradients

  • Jacopo Castellini
  • Sam Devlin
  • Frans A. Oliehoek
  • Rahul Savani

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr. Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr. Reinforce avoids difficulties associated with learning the 𝑄-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr. Reinforce that learns a reward network that is used to estimate the difference rewards.

IJCAI Conference 2020 Conference Paper

Robust Market Making via Adversarial Reinforcement Learning

  • Thomas Spooner
  • Rahul Savani

We show that adversarial reinforcement learning (ARL) can be used to produce market marking agents that are robust to adversarial and adaptively-chosen market conditions. To apply ARL, we turn the well-studied single-agent model of Avellaneda and Stoikov [2008] into a discrete-time zero-sum game between a market maker and adversary. The adversary acts as a proxy for other market participants that would like to profit at the market maker's expense. We empirically compare two conventional single-agent RL agents with ARL, and show that our ARL approach leads to: 1) the emergence of risk-averse behaviour without constraints or domain-specific penalties; 2) significant improvements in performance across a set of standard metrics, evaluated with or without an adversary in the test environment, and; 3) improved robustness to model uncertainty. We empirically demonstrate that our ARL method consistently converges, and we prove for several special cases that the profiles that we converge to correspond to Nash equilibria in a simplified single-stage game.

ECAI Conference 2020 Conference Paper

The Automated Inspection of Opaque Liquid Vaccines

  • Gregory Palmer
  • Benjamin Schnieders
  • Rahul Savani
  • Karl Tuyls
  • Joscha-David Fossel
  • Harry Flore

In the pharmaceutical industry the screening of opaque vaccines containing suspensions is currently a manual task carried out by trained human visual inspectors. We show that deep learning can be used to effectively automate this process. A moving contrast is required to distinguish anomalies from other particles, reflections and dust resting on a vial’s surface. We train 3D-ConvNets to predict the likelihood of 20-frame video samples containing anomalies. Our unaugmented dataset consists of hand-labelled samples, recorded using vials provided by the HAL Allergy Group, a pharmaceutical company. We trained ten randomly initialized 3D-ConvNets to provide a benchmark, observing mean AUROC scores of 0. 94 and 0. 93 for positive samples (containing anomalies) and negative (anomaly-free) samples, respectively. Using Frame-Completion Generative Adversarial Networks we: (i) introduce an algorithm for computing saliency maps, which we use to verify that the 3D-ConvNets are indeed identifying anomalies; (ii) propose a novel self-training approach using the saliency maps to determine if multiple networks agree on the location of anomalies. Our self-training approach allows us to augment our data set by labelling 217, 888 additional samples. 3D-ConvNets trained with our augmented dataset improve on the results we get when we train only on the unaugmented dataset.

AAMAS Conference 2019 Conference Paper

Negative Update Intervals in Deep Multi-Agent Reinforcement Learning

  • Gregory Palmer
  • Rahul Savani
  • Karl Tuyls

In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Qlearning [15] addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have proven more robust when dealing with multiple pathologies simultaneously [29]. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game [3], within which agents must overcome multiple pathologies simultaneously, including relative overgeneralisation, stochasticity, the alter-exploration and moving target problems, while learning from a large observation space. We find that existing lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in this environment. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI- DDQN), a Deep MA-RL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI- DDQN consistently gravitates towards optimal joint-policies in our environment, overcoming the outlined pathologies.

AAMAS Conference 2019 Conference Paper

The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement Learning

  • Jacopo Castellini
  • Frans A. Oliehoek
  • Rahul Savani
  • Shimon Whiteson

Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. In this work, we empirically investigate the representational power of various network architectures on a series of one-shot games. Despite their simplicity, these games capture many of the crucial problems that arise in the multi-agent setting, such as an exponential number of joint actions or the lack of an explicit coordination mechanism. Our results quantify how well various approaches can represent the requisite value functions, and help us identify issues that can impede good performance.

AAMAS Conference 2018 Conference Paper

Lenient Multi-Agent Deep Reinforcement Learning

  • Gregory Palmer
  • Karl Tuyls
  • Daan Bloembergen
  • Rahul Savani

Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated when agents update their policies in parallel [9]. In this work we apply leniency [22] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the valuefunction update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [20] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [6]. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.

AAMAS Conference 2018 Conference Paper

Market Making via Reinforcement Learning

  • Thomas Spooner
  • John Fearnley
  • Rahul Savani
  • Andreas Koukorinis

Market making is a fundamental trading problem in which an agent provides liquidity by continually offering to buy and sell a security. The problem is challenging due to inventory risk, the risk of accumulating an unfavourable position and ultimately losing money. In this paper, we develop a high-fidelity simulation of limit order book markets, and use it to design a market making agent using temporal-difference reinforcement learning. We use a linear combination of tile codings as a value function approximator, and design a custom reward function that controls inventory risk. We demonstrate the effectiveness of our approach by showing that our agent outperforms both simple benchmark strategies and a recent online learning approach from the literature.

AAMAS Conference 2016 Conference Paper

An Empirical Study on Computing Equilibria in Polymatrix Games

  • Argyrios Deligkas
  • John Fearnley
  • Tobenna Peter Igwe
  • Rahul Savani

The Nash equilibrium is an important benchmark for behaviour in systems of strategic autonomous agents. Polymatrix games are a succinct and expressive representation of multiplayer games that model pairwise interactions between players. The empirical performance of algorithms to solve these games has received little attention, despite their wide-ranging applications. In this paper we carry out a comprehensive empirical study of two prominent algorithms for computing a sample equilibrium in these games, Lemke’s algorithm that computes an exact equilibrium, and a gradient descent method that computes an approximate equilibrium. Our study covers games arising from a number of interesting applications. We find that Lemke’s algorithm can compute exact equilibria in relatively large games in a reasonable amount of time. If we are willing to accept (high-quality) approximate equilibria, then we can deal with much larger games using the descent method. We also report on which games are most challenging for each of the algorithms. General Terms Algorithms, Economics

ECAI Conference 2016 Conference Paper

Space Debris Removal: A Game Theoretic Analysis

  • Richard Klíma
  • Daan Bloembergen
  • Rahul Savani
  • Karl Tuyls
  • Daniel Hennes
  • Dario Izzo

We analyse active space debris removal efforts from a strategic, game-theoretic perspective. An active debris removal mission is a costly endeavour that has a positive effect (or risk reduction) for all satellites in the same orbital band. This leads to a dilemma: each actor (space agency, private stakeholder, etc.) has an incentive to delay its actions and wait for others to respond. The risk of the latter action is that, if everyone waits the joint outcome will be catastrophic leading to what in game theory is referred to as the 'tragedy of the commons'. We introduce and thoroughly analyse this dilemma using simulation and empirical game theory in a two player setting.

SODA Conference 2016 Conference Paper

The Complexity of All-switches Strategy Improvement

  • John Fearnley
  • Rahul Savani

Strategy improvement is a widely-used and well-studied class of algorithms for solving graph-based infinite games. These algorithms are parametrized by a switching rule, and one of the most natural rules is “all switches” which switches as many edges as possible in each iteration. Continuing a recent line of work, we study all-switches strategy improvement from the perspective of computational complexity. We consider two natural decision problems, both of which have as input a game G, a starting strategy s, and an edge e. The problems are: 1. The edge switch problem, namely, is the edge e ever switched by all-switches strategy improvement when it is started from s on game G? 2. The optimal strategy problem, namely, is the edge e used in the final strategy that is found by strategy improvement when it is started from s on game G? We show PSPACE-completeness of the edge switch problem and optimal strategy problem for the following settings: Parity games with the discrete strategy improvement algorithm of Vöge and Jurdziński; mean-payoff games with the gain-bias algorithm [11, 33]; and discounted-payoff games and simple stochastic games with their standard strategy improvement algorithms. We also show PSPACE-completeness of an analogous problem to edge switch for the bottom-antipodal algorithm for Acyclic Unique Sink Orientations on Cubes.

JMLR Journal 2015 Journal Article

Learning Equilibria of Games via Payoff Queries

  • John Fearnley
  • Martin Gairing
  • Paul W. Goldberg
  • Rahul Savani

A recent body of experimental literature has studied {\em empirical game-theoretical analysis}, in which we have partial knowledge of a game, consisting of observations of a subset of the pure-strategy profiles and their associated payoffs to players. The aim is to find an exact or approximate Nash equilibrium of the game, based on these observations. It is usually assumed that the strategy profiles may be chosen in an on-line manner by the algorithm. We study a corresponding computational learning model, and the query complexity of learning equilibria for various classes of games. We give basic results for exact equilibria of bimatrix and graphical games. We then study the query complexity of approximate equilibria in bimatrix games. Finally, we study the query complexity of exact equilibria in symmetric network congestion games. For directed acyclic networks, we can learn the cost functions (and hence compute an equilibrium) while querying just a small fraction of pure-strategy profiles. For the special case of parallel links, we have the stronger result that an equilibrium can be identified while only learning a small fraction of the cost values. [abs] [ pdf ][ bib ] &copy JMLR 2015. ( edit, beta )

AAAI Conference 2014 Conference Paper

Increasing VCG Revenue by Decreasing the Quality of Items

  • Mingyu Guo
  • Argyrios Deligkas
  • Rahul Savani

The VCG mechanism is the standard method to incentivize bidders in combinatorial auctions to bid truthfully. Under the VCG mechanism, the auctioneer can sometimes increase revenue by “burning” items. We study this phenomenon in a setting where items are described by a number of attributes. The value of an attribute corresponds to a quality level, and bidders’ valuations are non-decreasing in the quality levels. In addition to burning items, we allow the auctioneer to present some of the attributes as lower quality than they actually are. We consider the following two revenue maximization problems under VCG: finding an optimal way to mark down items by reducing their quality levels, and finding an optimal set of items to burn. We study the effect of the following parameters on the computational complexity of these two problems: the number of attributes, the number of quality levels per attribute, and the complexity of the bidders’ valuation functions. Bidders have unit demand, so VCG’s outcome can be computed in polynomial time, and the valuation functions we consider are step functions that are non-decreasing with the quality levels. We prove that both problems are NP-hard even in the following three simple settings: a) four attributes, arbitrarily many quality levels per attribute, and single-step valuation functions, b) arbitrarily many attributes, two quality levels per attribute, and single-step valuation functions, and c) one attribute, arbitrarily many quality-levels, and multi-step valuation functions. For the case where items have only one attribute, and every bidder has a single-step valuation (zero below some quality threshold), we show that both problems can be solved in polynomial-time using a dynamic programming approach. For this case, we also quantify how much better marking down is than item burning, and we compare the revenue of both approaches with computational experiments.

IS Journal 2012 Journal Article

High-Frequency Trading: The Faster, the Better?

  • Rahul Savani

The floor of the New York Stock Exchange, filled with traders in a frenzy of activity, is perhaps the iconic image of financial markets. But today, the TV pictures we see of the trading floor are largely symbolic. The frenzy of activity instead takes place in server rooms, with human market makers replaced by the algorithms of high-frequency trading firms. They make their money with razor-thin margins and huge volumes. To thrive, they must master both technology and strategy. The debate now rages as to how beneficial they are to markets and how they should be regulated.

AAMAS Conference 2011 Conference Paper

Computing Stable Outcomes in Hedonic Games with Voting-Based Deviations

  • Martin Gairing
  • Rahul Savani

We study the computational complexity of finding stable outcomes in hedonic games, which are a class of coalition formation games. We restrict our attention to a nontrivial subclass of such games, which are guaranteed to possess stable outcomes, i. e. , the set of symmetric additively-separable hedonic games. These games are specified by an undirected edge-weighted graph: nodes are players, an outcome of the game is a partition of the nodes into coalitions, and the utility of a node is the sum of incident edge weights in the same coalition. We consider several stability requirements defined in the literature. These are based on restricting feasible player deviations, for example, by giving existing coalition members veto power. We extend these restrictions by considering more general forms of preference aggregation for coalition members. In particular, we consider voting schemes to decide if coalition members will allow a player to enter or leave their coalition. For all of the stability requirements we consider, the existence of a stable outcome is guaranteed by a potential function argument, and local improvements will converge to a stable outcome. We provide an almost complete characterization of these games in terms of the tractability of computing such stable outcomes. Our findings comprise positive results in the form of polynomialtime algorithms, and negative (PLS-completeness) results. The negative results extend to more general hedonic games.

FOCS Conference 2011 Conference Paper

The Complexity of the Homotopy Method, Equilibrium Selection, and Lemke-Howson Solutions

  • Paul W. Goldberg
  • Christos H. Papadimitriou
  • Rahul Savani

We show that the widely used homotopy method for solving fix point problems, as well as the Harsanyi-Selten equilibrium selection process for games, are PSPACE-complete to implement. Extending our result for the Harsanyi-Selten process, we show that several other homotopy-based algorithms for finding equilibria of games are also PSPACE-complete to implement. A further application of our techniques yields the result that it is PSPACE-complete to compute any of the equilibria that could be found via the classical Lemke-How son algorithm, a complexity-theoretic strengthening of the result in [24]. These results show that our techniques can be widely applied and suggest that the PSPACE-completeness of implementing homotopy methods is a general principle.

FOCS Conference 2004 Conference Paper

Exponentially Many Steps for Finding a Nash Equilibrium in a Bimatrix Game

  • Rahul Savani
  • Bernhard von Stengel

The Lemke-Howson algorithm is the classical algorithm for the problem NASH of finding one Nash equilibrium of a bimatrix game. It provides a constructive and elementary proof of existence of an equilibrium, by a typical "directed parity argument", which puts NASH into the complexity class PPAD. This paper presents a class of bimatrix games for which the Lemke-Howson algorithm takes, even in the best case, exponential time in the dimension d of the game, requiring /spl Omega/((/spl theta//sup 3/4/)/sup d/) many steps, where /spl theta/ is the golden ratio. The "parity argument" for NASH is thus explicitly shown to be inefficient. The games are constructed using pairs of dual cyclic polytopes with 2d suitably labeled facets in d-space.