Author name cluster

Brian Logan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

69 papers

1 author row

JAIR Journal 2026 Journal Article

Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Giovanni Varricchione
Natasha Alechina
Mehdi Dastani
Brian Logan

Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. In this paper, we show how multi-agent reward machines for team tasks can be synthesised automatically from an abstraction of the environment in which the agents act and a high-level specification of the desired team behaviour expressed in a fragment of Alternating-time Temporal Logic. We present results from a number of benchmarks which suggest that our automated approach performs as well or better than reward machines in the literature.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Probabilistic Strategy Logic with Degrees of Observability

Chunyan Mu
Nima Motamed
Natasha Alechina
Brian Logan

There has been considerable work on reasoning about the strategic ability of agents under imperfect information. However, existing logics such as Probabilistic Strategy Logic are unable to express properties relating to information transparency. Information transparency concerns the extent to which agents' behaviours and actions are observable by other agents. Reasoning about information transparency is useful in many domains including security, privacy, and decision-making. In this paper, we present a formal framework for reasoning about information transparency properties in stochastic multi-agent systems. We extend Probabilistic Strategy Logic with new observability operators that capture the degree of observability of temporal properties by agents. We show that the model checking problem for the resulting logic is decidable.

PDF Details DOI

KR Conference 2025 Conference Paper

Pushdown Reward Machines for Reinforcement Learning

Giovanni Varricchione
Toryn Q. Klassen
Natasha Alechina
Mehdi Dastani
Brian Logan
Sheila A. McIlraith

Reward machines (RMs) are automata structures that encode (non-Markovian) reward functions for reinforcement learning (RL). RMs can reward any behaviour representable in regular languages and, when paired with RL algorithms that exploit RM structure, have been shown to significantly improve sample efficiency in many domains. In this work, we present pushdown reward machines (pdRMs), an extension of reward machines based on deterministic pushdown automata. pdRMs can recognise and reward temporally extended behaviours representable in deterministic context-free languages, making them more expressive than reward machines. We introduce two variants of pdRM-based policies, one which has access to the entire stack of the pdRM, and one which can only access the top k symbols (for a given constant k) of the stack. We propose a procedure to check when the two kinds of policies (for a given environment, pdRM, and constant k) achieve the same optimal state values. We then provide theoretical results establishing the expressive power of pdRMs, and space complexity results for the proposed learning problems. Lastly, we propose an approach for off-policy RL algorithms that exploits counterfactual experiences with pdRMs. We conclude by providing experimental results showing how agents can be trained to perform tasks representable in deterministic context-free languages using pdRMs.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Temporal Causal Reasoning with (Non-Recursive) Structural Equation Models

Maksim Gladyshev
Natasha Alechina
Mehdi Dastani
Dragan Doder
Brian Logan

Structural equation models (SEM) are a standard approach to representing causal dependencies between variables. In this paper we propose a new interpretation of existing formalisms in the field of Actual Causality in which SEM's are viewed as mechanisms transforming the dynamics of exogenous variables into the dynamics of endogenous variables. This allows us to combine counterfactual causal reasoning with existing temporal logic formalizms, and to introduce a temporal logic, CPLTL, for causal reasoning about such structures. Then, we demonstrate that the standard restriction to so-called recursive models (with no cycles in the dependency graphs) is not necessary in our approach. This fact provides us extra tools for reasoning about mutually dependent processes and feedback loops. Finally, we introduce the notions of model equivalence for temporal causal models and show that CPLTL has an efficient model-checking procedure.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Intention Progression with Temporally Extended Goals

Yuan Yao
Natasha Alechina
Brian Logan

The Belief-Desire-Intention (BDI) approach to agent development has formed the basis for much of the research on architectures for autonomous agents. A key advantage of the BDI approach is that agents may pursue multiple intentions in parallel. However, previous approaches to managing possible interactions between concurrently executing intentions are limited to interactions between simple achievement goals (and in some cases maintenance goals). In this paper we present a new approach to intention progression for agents with temporally extended goals which allow mixing reachability and invariant properties, e. g. , ``travel to location A while not exceeding a gradient of 5%''. Temporally extended goals may be specified at run-time (top-level goals), and as subgoals in plans. In addition, our approach allows human-authored plans and plans implemented as RL policies to be freely mixed in an agent program, allowing the development of agents with `neuro-symbolic' architectures.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Pure-Past Action Masking

Giovanni Varricchione
Natasha Alechina
Mehdi Dastani
Giuseppe De Giacomo
Brian Logan
Giuseppe Perelli

We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for safe reinforcement learning. In PPAM, actions are disallowed (“masked”) according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separation of concerns between the safety constraints and reward specifications of the (learning) agent. We prove formally that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. Finally, we provide empirical results showing how PPAM can guarantee constraint satisfaction in practice.

PDF Details DOI

JAIR Journal 2023 Journal Article

A Logic of East and West

Heshan Du
Natasha Alechina
Amin Farjudian
Brian Logan
Can Zhou
Anthony G. Cohn

We propose a logic of east and west (LEW ) for points in 1D Euclidean space. It formalises primitive direction relations: east (E), west (W) and indeterminate east/west (Iew). It has a parameter τ ∈ N>1, which is referred to as the level of indeterminacy in directions. For every τ ∈ N>1, we provide a sound and complete axiomatisation of LEW, and prove that its satisfiability problem is NP-complete. In addition, we show that the finite axiomatisability of LEW depends on τ: if τ = 2 or τ = 3, then there exists a finite sound and complete axiomatisation; if τ > 3, then the logic is not finitely axiomatisable. LEW can be easily extended to higher-dimensional Euclidean spaces. Extending LEW to 2D Euclidean space makes it suitable for reasoning about not perfectly aligned representations of the same spatial objects in different datasets, for example, in crowd-sourced digital maps.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Data-Driven Revision of Conditional Norms in Multi-Agent Systems (Extended Abstract)

Davide Dell'Anna
Natasha Alechina
Fabiano Dalpiaz
Mehdi Dastani
Brian Logan

In multi-agent systems, norm enforcement is a mechanism for steering the behavior of individual agents in order to achieve desired system-level objectives. Due to the dynamics of multi-agent systems, however, it is hard to design norms that guarantee the achievement of the objectives in every operating context. Also, these objectives may change over time, thereby making previously defined norms ineffective. In this paper, we investigate the use of system execution data to automatically synthesise and revise conditional prohibitions with deadlines, a type of norms aimed at preventing agents from exhibiting certain patterns of behaviors. We propose DDNR (Data-Driven Norm Revision), a data-driven approach to norm revision that synthesises revised norms with respect to a data set of traces describing the behavior of the agents in the system. We evaluate DDNR using a state-of-the-art, off-the-shelf urban traffic simulator. The results show that DDNR synthesises revised norms that are significantly more accurate than the original norms in distinguishing adequate and inadequate behaviors for the achievement of the system-level objectives.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

Intention Progression with Maintenance Goals

Di Wu
Yuan Yao
Natasha Alechina
Brian Logan
John Thangarajah