Arrow Research search

Author name cluster

Aidan Scannell

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

ICLR Conference 2025 Conference Paper

Discrete Codebook World Models for Continuous Control

  • Aidan Scannell
  • Mohammadreza Nakhaeinezhadfard
  • Kalle Kujanpää
  • Yi Zhao 0014
  • Kevin Sebastian Luck
  • Arno Solin
  • Joni Pajarinen

In reinforcement learning (RL), world models serve as internal simulators, enabling agents to predict environment dynamics and future outcomes in order to make informed decisions. While previous approaches leveraging discrete latent spaces, such as DreamerV3, have demonstrated strong performance in discrete action settings and visual control tasks, their comparative performance in state-based continuous control remains underexplored. In contrast, methods with continuous latent spaces, such as TD-MPC2, have shown notable success in state-based continuous control benchmarks. In this paper, we demonstrate that modeling discrete latent states has benefits over continuous latent states and that discrete codebook encodings are more effective representations for continuous control, compared to alternative encodings, such as one-hot and label-based encodings. Based on these insights, we introduce DCWM: Discrete Codebook World Model, a self-supervised world model with a discrete and stochastic latent space, where latent states are codes from a codebook. We combine DCWM with decision-time planning to get our model-based RL algorithm, named DC-MPC: Discrete Codebook Model Predictive Control, which performs competitively against recent state-of-the-art algorithms, including TD-MPC2 and DreamerV3, on continuous control benchmarks.

AAAI Conference 2025 Conference Paper

Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning

  • Mohammadreza Nakhaeinezhadfard
  • Aidan Scannell
  • Joni Pajarinen

Offline meta-reinforcement learning aims to equip agents with the ability to rapidly adapt to new tasks by training on data from a set of different tasks. Context-based approaches utilize a history of state-action-reward transitions – referred to as the context – to infer a representation of the current task, and then condition the agent, i.e., the policy and value function, on this task representation. Intuitively, the better the task representation captures the underlying tasks, the better the agent can generalize to new tasks. Unfortunately, context-based approaches suffer from distribution mismatch, as the context in the offline data does not match the context at test time, limiting their ability to generalize to the test task. This leads to the task representation overfitting to the offline training data. Intuitively, the task representation should be independent of the behavior policy used to collect the offline data. To address this issue, we approximately minimize the mutual information between the distribution over the task representation and behavior policy by maximizing the entropy of behavior policy conditioned on the task representation. We validate our approach in MuJoCo environments, showing that compared to baselines, our task representation more faithfully represents the underlying tasks, leading to outperforming prior methods in both in-distribution and out-of-distribution tasks.

ICLR Conference 2024 Conference Paper

Function-space Parameterization of Neural Networks for Sequential Learning

  • Aidan Scannell
  • Riccardo Mereu
  • Paul Edmund Chang
  • Ella Tamir
  • Joni Pajarinen
  • Arno Solin

Sequential learning paradigms pose challenges for gradient-based deep learning due to difficulties incorporating new data and retaining prior knowledge. While Gaussian processes elegantly tackle these problems, they struggle with scalability and handling rich inputs, such as images. To address these issues, we introduce a technique that converts neural networks from weight space to function space, through a dual parameterization. Our parameterization offers: (*i*) a way to scale function-space methods to large data sets via sparsification, (*ii*) retention of prior knowledge when access to past data is limited, and (*iii*) a mechanism to incorporate new data without retraining. Our experiments demonstrate that we can retain knowledge in continual learning and incorporate new data efficiently. We further show its strengths in uncertainty quantification and guiding exploration in model-based RL. Further information and code is available on the project website.

ICRA Conference 2021 Conference Paper

Trajectory Optimisation in Learned Multimodal Dynamical Systems via Latent-ODE Collocation

  • Aidan Scannell
  • Carl Henrik Ek
  • Arthur Richards

This paper presents a two-stage method to perform trajectory optimisation in multimodal dynamical systems with unknown nonlinear stochastic transition dynamics. The method finds trajectories that remain in a preferred dynamics mode where possible and in regions of the transition dynamics model that have been observed and can be predicted confidently. The first stage leverages a Mixture of Gaussian Process Experts method to learn a predictive dynamics model from historical data. Importantly, this model learns a gating function that indicates the probability of being in a particular dynamics mode at a given state location. This gating function acts as a coordinate map for a latent Riemannian manifold on which shortest trajectories are solutions to our trajectory optimisation problem. Based on this intuition, the second stage formulates a geometric cost function, which it then implicitly minimises by projecting the trajectory optimisation onto the second-order geodesic ODE; a classic result of Riemannian geometry. A set of collocation constraints are derived that ensure trajectories are solutions to this ODE, implicitly solving the trajectory optimisation problem.