Author name cluster

Anthony DiGiovanni

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

AAMAS Conference 2025 Conference Paper

Safe Pareto Improvements for Expected Utility Maximizers in Program Games

Anthony DiGiovanni
Jesse Clifton
Nicolas Macé

Agents in mixed-motive coordination problems such as Chicken may fail to coordinate on a Pareto-efficient outcome. Safe Pareto improvements (SPIs) were originally proposed to mitigate miscoordination in cases where players lack probabilistic beliefs as to how their agents will play a game; agents are instructed to behave so as to guarantee a Pareto improvement on how they would play by default. More generally, SPIs may be defined as transformations of strategy profiles such that all players are necessarily better off under the transformed profile. In this work, we investigate the extent to which SPIs can reduce downsides of miscoordination between expected utility-maximizing agents. We consider games in which players submit computer programs that can condition their decisions on each other’s code, and use this property to construct SPIs using programs capable of renegotiation. We first show that under mild conditions on players’ beliefs, each player always prefers to use renegotiation. Next, we show that under similar assumptions, each player always prefers to be willing to renegotiate at least to the point at which they receive the lowest payoff they can attain in any efficient outcome. Thus subjectively optimal play guarantees players at least these payoffs, without the need for coordination on specific Pareto improvements.

PDF

AAAI Conference 2023 Conference Paper

Commitment Games with Conditional Information Disclosure

Anthony DiGiovanni
Jesse Clifton

The conditional commitment abilities of mutually transparent computer agents have been studied in previous work on commitment games and program equilibrium. This literature has shown how these abilities can help resolve Prisoner’s Dilemmas and other failures of cooperation in complete information settings. But inefficiencies due to private information have been neglected thus far in this literature, despite the fact that these problems are pervasive and might also be addressed by greater mutual transparency. In this work, we introduce a framework for commitment games with a new kind of conditional commitment device, which agents can use to conditionally disclose private information. We prove a folk theorem for this setting that provides sufficient conditions for ex post efficiency, and thus represents a model of ideal cooperation between agents without a third-party mediator. Further, extending previous work on program equilibrium, we develop an implementation of conditional information disclosure. We show that this implementation forms program ε-Bayesian Nash equilibria corresponding to the Bayesian Nash equilibria of these commitment games.

PDF Details DOI

UAI Conference 2022 Conference Paper

Balancing adaptability and non-exploitability in repeated games

Anthony DiGiovanni
Ambuj Tewari

We study the problem of adaptability in repeated games: simultaneously guaranteeing low regret for several classes of opponents. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some “fair” value. Our solution is an expert algorithm (LAFF), which searches within a set of sub-algorithms that are optimal for each opponent class, and punishes evidence of exploitation by switching to a policy that enforces a fair solution. With benchmarks that depend on the opponent class, we first show that LAFF has sublinear regret uniformly over these classes. Second, we show that LAFF discourages exploitation, because exploitative opponents have linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning.

Details

UAI Conference 2021 Conference Paper

Thompson sampling for Markov games with piecewise stationary opponent policies

Anthony DiGiovanni
Ambuj Tewari

Reinforcement learning problems with multiple agents pose the challenge of efficiently adapting to nonstationary dynamics arising from other agents’ strategic behavior. Although several algorithms exist for these problems with promising empirical results, regret analysis and efficient use of other-agent models in general-sum games is very limited. We propose an algorithm (TSMG) for general-sum Markov games against agents that switch between several stationary policies, combining change detection with Thompson sampling to learn parametric models of these policies. Under standard assumptions for parametric Markov decision process (MDP) learning, the expected regret of TSMG in the worst case over policy parameters and switch schedules is near-optimal in time and number of switches, up to logarithmic factors. Our experiments on simulated games show that TSMG can outperform standard Thompson sampling and a version of Thompson sampling with a static reset schedule, despite the violation of an assumption that the MDPs induced by the other player are ergodic.

Details