Author name cluster

Nicholas Bishop

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Emergent Risk Awareness in Rational Agents under Resource Constraints

Daniel Jarne Ornia
Nicholas Bishop
Joel Dyer
Wei-Chen Lee
Anisoara Calinescu
Doyne Farmer
Michael Wooldridge

Advanced reasoning models with agentic capabilities (AI agents) are deployed to interact with humans and to solve sequential decision‑making problems under (often approximate) utility functions and internal models. When such problems have resource or failure constraints where action sequences may be forcibly terminated once resources are exhausted, agents face implicit trade‑offs that reshape their utility-driven (rational) behaviour. Additionally, since these agents are typically commissioned by a human principal to act on their behalf, asymmetries in constraint exposure can give rise to previously unanticipated misalignment between human objectives and agent incentives. We formalise this setting through a survival bandit framework, provide theoretical and empirical results that quantify the impact of survival‑driven preference shifts, identify conditions under which misalignment emerges and propose mechanisms to mitigate the emergence of risk-seeking or risk-averse behaviours. As a result, this work aims to increase understanding and interpretability of emergent behaviours of AI agents operating under such survival pressure, and offer guidelines for safely deploying such AI systems in critical resource‑limited environments.

PDF Details

ICML Conference 2025 Conference Paper

Learning Likelihood-Free Reference Priors

Nicholas Bishop
Daniel Jarne Ornia
Joel Dyer
Anisoara Calinescu
Michael J. Wooldridge

Simulation modeling offers a flexible approach to constructing high-fidelity synthetic representations of complex real-world systems. However, the increased complexity of such models introduces additional complications, for example when carrying out statistical inference procedures. This has motivated a large and growing literature on likelihood-free or simulation-based inference methods, which approximate (e. g. , Bayesian) inference without assuming access to the simulator’s intractable likelihood function. A hitherto neglected problem in the simulation-based Bayesian inference literature is the challenge of constructing minimally informative reference priors for complex simulation models. Such priors maximise an expected Kullback-Leibler distance from the prior to the posterior, thereby influencing posterior inferences minimally and enabling an “objective” approach to Bayesian inference that does not necessitate the incorporation of strong subjective prior beliefs. In this paper, we propose and test a selection of likelihood-free methods for learning reference priors for simulation models, using variational approximations to these priors and a variety of mutual information estimators. Our experiments demonstrate that good approximations to reference priors for simulation models are in this way attainable, providing a first step towards the development of likelihood-free objective Bayesian inference procedures.

Details

IJCAI Conference 2024 Conference Paper

A Strategic Analysis of Prepayments in Financial Credit Networks

Hao Zhou
Yongzhao Wang
Konstantinos Varsos
Nicholas Bishop
Rahul Savani
Anisoara Calinescu
Michael Wooldridge

In financial credit networks, prepayments enable a firm to settle its debt obligations ahead of an agreed-upon due date. Prepayments have a transformative impact on the structure of networks, influencing the financial well-being (utility) of individual firms. This study investigates prepayments from both theoretical and empirical perspectives. We first establish the computational complexity of finding prepayments that maximize welfare, assuming global coordination among firms in the financial network. Subsequently, our focus shifts to understanding the strategic behavior of individual firms in the presence of prepayments. We introduce a prepayment game where firms strategically make prepayments, delineating the existence of pure strategy Nash equilibria and analyzing the price of anarchy (stability) within this game. Recognizing the computational challenges associated with determining Nash equilibria in prepayment games, we use a simulation-based approach, known as empirical game-theoretic analysis (EGTA). Through EGTA, we are able to find Nash equilibria among a carefully-chosen set of heuristic strategies. By scrutinizing the equilibrium behavior of firms, we outline the characteristics of high-performing strategies for strategic prepayments and establish connections between our empirical and theoretical findings.

PDF Details DOI

UAI Conference 2024 Conference Paper

Causally Abstracted Multi-armed Bandits

Fabio Massimo Zennaro
Nicholas Bishop
Joel Dyer
Yorgos Felekis
Anisoara Calinescu
Michael J. Wooldridge
Theodoros Damoulas

Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.

Details

NeurIPS Conference 2024 Conference Paper

Interventionally Consistent Surrogates for Complex Simulation Models

Joel Dyer
Nicholas Bishop
Yorgos Felekis
Fabio Massimo Zennaro
Anisoara Calinescu
Theodoros Damoulas
Michael Wooldridge

Large-scale simulation models of complex socio-technical systems provide decision-makers with high-fidelity testbeds in which policy interventions can be evaluated and what-if scenarios explored. Unfortunately, the high computational cost of such models inhibits their widespread use in policy-making settings. Surrogate models can address these computational limitations, but to do so they must behave consistently with the simulator under interventions of interest. In this paper, we build upon recent developments in causal abstractions to develop a framework for learning interventionally consistent surrogate models for large-scale, complex simulation models. We provide theoretical results showing that our proposed approach induces surrogates to behave consistently with high probability with respect to the simulator across interventions of interest, facilitating rapid experimentation with policy interventions in complex systems. We further demonstrate with empirical studies that conventionally trained surrogates can misjudge the effect of interventions and misguide decision-makers towards suboptimal interventions, while surrogates trained for interventional consistency with our method closely mimic the behaviour of the original simulator under interventions of interest.

PDF Details DOI

AAMAS Conference 2024 Conference Paper

Population Synthesis as Scenario Generation for Simulation-based Planning under Uncertainty

Joel Dyer
Arnau Quera-Bofarull
Nicholas Bishop
J. Doyne Farmer
Anisoara Calinescu
Michael Wooldridge

Agent-based models have the potential to become instrumental tools in real-world decision-making, equipping policy-makers with the ability to experiment with high-fidelity representations of complex systems. Such models often rely crucially on the generation of synthetic populations with which the model is simulated, and their behaviour can depend strongly on the population’s composition. Existing approaches to synthesising populations attempt to model distributions over agent-level attributes on the basis of data collected from a real-world population. Unfortunately, these approaches are of limited utility when data is incomplete or altogether absent – such as during novel, unprecedented circumstances – so that considerable uncertainty regarding the characteristics of the population being modelled remains, even after accounting for any such data. What is therefore needed in these cases are tools to simulate and plan for the possible future behaviours of the complex system that can be generated by populations that are consistent with this remaining uncertainty. To this end, we frame the problem of synthesising populations in agent-based models as a problem of scenario generation. The framework that we present is designed to generate synthetic populations that are on the one hand consistent with any persisting uncertainty, while on the other hand matching closely a target, user-specified scenario that the decision-maker would like to explore and plan for. We propose and compare two generic approaches to generating synthetic populations that produce target scenarios, and demonstrate through simulation studies that these approaches are able to automatically generate synthetic populations whose behaviours match the target scenario, thereby facilitating simulation-based planning under uncertainty.

PDF

JAIR Journal 2024 Journal Article

Selfishly Prepaying in Financial Credit Networks

Hao Zhou
Yongzhao Wang
Konstantinos Varsos
Nicholas Bishop
Rahul Savani
Anisoara Calinescu
Michael Wooldridge

In financial credit networks, prepayments enable a firm to settle its debt obligations ahead of an agreed-upon due date. Prepayments have a transformative impact on the structure of networks, influencing the financial well-being (utility) of individual firms. This study investigates prepayments from both theoretical and empirical perspectives. We first establish the computational complexity of finding prepayments that maximize welfare, assuming global coordination among firms in the financial network. Subsequently, our focus shifts to understanding the strategic behavior of individual firms in the presence of prepayments. We introduce a prepayment game where firms strategically make prepayments, delineating the existence of pure strategy Nash equilibria and analyzing the price of anarchy (stability) within this game. Recognizing the computational challenges associated with determining Nash equilibria in prepayment games, we use a simulation-based approach, known as empirical game-theoretic analysis (EGTA). Through EGTA, we are able to find Nash equilibria among a carefully-chosen set of heuristic strategies. By examining the equilibrium behavior of firms, we outline the characteristics of high-performing strategies for strategic prepayments and establish connections between our empirical and theoretical findings.

PDF Details DOI

AAMAS Conference 2022 Conference Paper

Manipulation of Machine Learning Algoirhtms

Nicholas Bishop

As data becomes increasingly available, individuals, organisations and companies are increasingly applying machine learning algorithms to make decisions. In many cases, those decisions have a direct effect on those who provided the data to the decision maker. In other words, data providers often have a vested interest in the decisions made based on the data provided. Therefore, decision makers should anticipate that data providers may alter or change the data they provide in order to achieve a preferential outcome. Such strategic behaviour is not adequately modelled by classical machine learning settings in the literature. As a result, new machine learning algorithms are required, which take into the account the incentives and capabilities of data providers when making decisions. This paper summarises a PhD project which attempts to address this problem in a number of contexts.

PDF

AAAI Conference 2022 Conference Paper

Sequential Blocked Matching

Nicholas Bishop
Hau Chan
Debmalya Mandal
Long Tran-Thanh

We consider a sequential blocked matching (SBM) model where strategic agents repeatedly report ordinal preferences over a set of services to a central planner. The planner’s goal is to elicit agents’ true preferences and design a policy that matches services to agents in order to maximize the expected social welfare with the added constraint that each matched service can be blocked or unavailable for a number of time periods. Naturally, SBM models the repeated allocation of reusable services to a set of agents where each allocated service becomes unavailable for a fixed duration. We first consider the offline SBM setting, where the strategic agents are aware of their true preferences. We measure the performance of any policy by distortion, the worst-case multiplicative approximation guaranteed by any policy. For the setting with s services, we establish lower bounds of Ω(s) and Ω( √ s) on the distortions of any deterministic and randomised mechanisms, respectively. We complement these results by providing approximately truthful, measured by incentive ratio, deterministic and randomised policies based on random serial dictatorship which match our lower bounds. Our results show that there is a significant improvement if one considers the class of randomised policies. Finally, we consider the online SBM setting with bandit feedback where each agent is initially unaware of her true preferences, and the planner must facilitate each agent in the learning of their preferences through the matching of services over time. We design an approximately truthful mechanism based on the explore-then-commit paradigm, which achieves logarithmic dynamic approximate regret.

PDF Details

AAMAS Conference 2021 Conference Paper

How to Guide a Non-Cooperative Learner to Cooperate: Exploiting No-Regret Algorithms in System Design

Nicholas Bishop
Le Cong Dinh
Long Tran-Thanh

We investigate a repeated two-player game setting where the column player is also a designer of the system, and has full control over payoff matrices. In addition, we assume that the row player uses a no-regret algorithm to efficiently learn how to adapt their strategy to the column player’s behaviour over time. The goal of the column player is to guide her opponent into picking a mixed strategy which is preferred by the system designer. Therefore, she needs to: (i) design appropriate payoffs for both players; and (ii) strategically interact with the row player during a sequence of plays in order to guide her opponent to converge to the desired mixed strategy. To design appropriate payoffs, we propose a novel zero-sum game construction whose unique minimax solution contains the desired behaviour. We also propose another construction in which only the minimax strategy of the row player is unique. Finally, we propose a new game playing algorithm for the system designer and show that it can guide the row player to its minimax strategy, under the assumption that the row player adopts a stable no-regret algorithm.

PDF

NeurIPS Conference 2020 Conference Paper

Adversarial Blocking Bandits

Nicholas Bishop
Hau Chan
Debmalya Mandal
Long Tran-Thanh

We consider a general adversarial multi-armed blocking bandit setting where each played arm can be blocked (unavailable) for some time periods and the reward per arm is given at each time period adversarially without obeying any distribution. The setting models scenarios of allocating scarce limited supplies (e. g. , arms) where the supplies replenish and can be reused only after certain time periods. We first show that, in the optimization setting, when the blocking durations and rewards are known in advance, finding an optimal policy (e. g. , determining which arm per round) that maximises the cumulative reward is strongly NP-hard, eliminating the possibility of a fully polynomial-time approximation scheme (FPTAS) for the problem unless P = NP. To complement our result, we show that a greedy algorithm that plays the best available arm at each round provides an approximation guarantee that depends on the blocking durations and the path variance of the rewards. In the bandit setting, when the blocking durations and rewards are not known, we design two algorithms, RGA and RGA-META, for the case of bounded duration an path variation. In particular, when the variation budget B T is known in advance, RGA can achieve O(\sqrt{T(2\tilde{D}+K)B {T}}) dynamic approximate regret. On the other hand, when B_T is not known, we show that the dynamic approximate regret of RGA-META is at most O((K+\tilde{D})^{1/4}\tilde{B}^{1/2}T^{3/4}) where \tilde{B} is the maximal path variation budget within each batch of RGA-META (which is provably in order of o(\sqrt{T}). We also prove that if either the variation budget or the maximal blocking duration is unbounded, the approximate regret will be at least Theta(T). We also show that the regret upper bound of RGA is tight if the blocking durations are bounded above by an order of O(1).

PDF Details

NeurIPS Conference 2020 Conference Paper

Optimal Learning from Verified Training Data

Nicholas Bishop
Long Tran-Thanh
Enrico Gerding

Standard machine learning algorithms typically assume that data is sampled independently from the distribution of interest. In attempts to relax this assumption, fields such as adversarial learning typically assume that data is provided by an adversary, whose sole objective is to fool a learning algorithm. However, in reality, it is often the case that data comes from self-interested agents, with less malicious goals and intentions which lie somewhere between the two settings described above. To tackle this problem, we present a Stackelberg competition model for least squares regression, in which data is provided by agents who wish to achieve specific predictions for their data. Although the resulting optimisation problem is nonconvex, we derive an algorithm which converges globally, outperforming current approaches which only guarantee convergence to local optima. We also provide empirical results on two real-world datasets, the medical personal costs dataset and the red wine dataset, showcasing the performance of our algorithm relative to algorithms which are optimal under adversarial assumptions, outperforming the state of the art.

PDF Details