Author name cluster

Simon Schmitt

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

AAAI Conference 2025 Conference Paper

General Uncertainty Estimation with Delta Variances

Simon Schmitt
John Shawe-Taylor
Hado van Hasselt

Decision makers may suffer from uncertainty induced by limited data. This may be mitigated by accounting for epistemic uncertainty, which is however challenging to estimate efficiently for large neural networks. To this extent we investigate Delta Variances, a family of algorithms for epistemic uncertainty quantification, that is computationally efficient and convenient to implement. It can be applied to neural networks and more general functions composed of neural networks. As an example we consider a weather simulator with a neural-network-based step function inside - here Delta Variances empirically obtain competitive results at the cost of a single gradient computation. The approach is convenient as it requires no changes to the neural network architecture or training procedure. We discuss multiple ways to derive Delta Variances theoretically noting that special cases recover popular techniques and present a unified perspective on multiple related methods. Finally we observe that this general perspective gives rise to a natural extension and empirically show its benefit.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Exploration via Epistemic Value Estimation

Simon Schmitt
John Shawe-Taylor
Hado van Hasselt

How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions -- for instance to compute an exploration bonus or upper confidence bound. Unfortunately the required uncertainty is difficult to estimate in general with function approximation. We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. We use the recipe to derive an epistemic Q-Learning agent and observe competitive performance on a series of benchmarks. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Chaining Value Functions for Off-Policy Learning

Simon Schmitt
John Shawe-Taylor
Hado van Hasselt

To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn ‘off-policy’ about policies that differ from the policy used to generate its experience. This is important to learn counterfactuals, or because the experience was generated out of its own control. However, off-policy learning is non-trivial, and standard reinforcementlearning algorithms can be unstable and divergent. In this paper we discuss a novel family of off-policy prediction algorithms which are convergent by construction. The idea is to first learn on-policy about the data-generating behaviour, and then bootstrap an off-policy value estimate on this onpolicy estimate, thereby constructing a value estimate that is partially off-policy. This process can be repeated to build a chain of value functions, each time bootstrapping a new estimate on the previous estimate in the chain. Each step in the chain is stable and hence the complete algorithm is guaranteed to be stable. Under mild conditions this comes arbitrarily close to the off-policy TD solution when we increase the length of the chain. Hence it can compute the solution even in cases where off-policy TD diverges. We prove that the proposed scheme is convergent and corresponds to an iterative decomposition of the inverse key matrix. Furthermore it can be interpreted as estimating a novel objective – that we call a ‘k-step expedition’ – of following the target policy for finitely many steps before continuing indefinitely with the behaviour policy. Empirically we evaluate the idea on challenging MDPs such as Baird’s counter example and observe favourable results.

PDF Details

AAAI Conference 2021 Conference Paper

Gated Linear Networks

Joel Veness
Tor Lattimore
David Budden
Avishkar Bhoopchand
Christopher Mattern
Agnieszka Grabska-Barwinska
Eren Sezener
Jianan Wang

This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs). What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn feature representations in favor of rapid online learning. Individual neurons are able to model nonlinear functions via the use of data-dependent gating in conjunction with online convex optimization. We show that this architecture gives rise to universal learning capabilities in the limit, with effective model capacity increasing as a function of network size in a manner comparable with deep ReLU networks. Furthermore, we demonstrate that the GLN learning mechanism possesses extraordinary resilience to catastrophic forgetting, performing almost on par to an MLP with dropout and Elastic Weight Consolidation on standard benchmarks.

PDF Details

YNICL Journal 2021 Journal Article

Interaction of developmental factors and ordinary stressful life events on brain structure in adults

Kai G. Ringwald
Tina Meller
Simon Schmitt
Till F.M. Andlauer
Frederike Stein
Katharina Brosch
Julia-Katharina Pfarr
Olaf Steinsträter

An interplay of early environmental and genetic risk factors with recent stressful life events (SLEs) in adulthood increases the risk for adverse mental health outcomes. The interaction of early risk and current SLEs on brain structure has hardly been investigated. Whole brain voxel-based morphometry analysis was performed in N = 786 (64.6% female, mean age = 33.39) healthy subjects to identify correlations of brain clusters with commonplace recent SLEs. Genetic and early environmental risk factors, operationalized as those for severe psychopathology (i.e., polygenic scores for neuroticism, childhood maltreatment, urban upbringing and paternal age) were assessed as modulators of the impact of SLEs on the brain. SLEs were negatively correlated with grey matter volume in the left medial orbitofrontal cortex (mOFC, FWE p = 0.003). This association was present for both, positive and negative, life events. Cognitive-emotional variables, i.e., neuroticism, perceived stress, trait anxiety, intelligence, and current depressive symptoms did not account for the SLE-mOFC association. Further, genetic and environmental risk factors were not correlated with grey matter volume in the left mOFC cluster and did not affect the association between SLEs and left mOFC grey matter volume. The orbitofrontal cortex has been implicated in stress-related psychopathology, particularly major depression in previous studies. We find that SLEs are associated with this area. Important early life risk factors do not interact with current SLEs on brain morphology in healthy subjects.

Details DOI

ICML Conference 2021 Conference Paper

Learning and Planning in Complex Action Spaces

Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Mohammadamin Barekatain
Simon Schmitt
David Silver 0001

Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.

Details

ICML Conference 2021 Conference Paper

Muesli: Combining Improvements in Policy Optimization

Matteo Hessel
Ivo Danihelka
Fabio Viola
Arthur Guez
Simon Schmitt
Laurent Sifre
Theophane Weber
David Silver 0001

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero’s state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

Details

ICML Conference 2020 Conference Paper

Off-Policy Actor-Critic with Shared Experience Replay

Simon Schmitt
Matteo Hessel
Karen Simonyan

We investigate the combination of actor-critic reinforcement learning algorithms with a uniform large-scale experience replay and propose solutions for two ensuing challenges: (a) efficient actor-critic learning with experience replay (b) the stability of off-policy learning where agents learn from other agents behaviour. To this end we analyze the bias-variance tradeoffs in V-trace, a form of importance sampling for actor-critic methods. Based on our analysis, we then argue for mixing experience sampled from replay with on-policy experience, and propose a new trust region scheme that scales effectively to data distributions where V-trace becomes unstable. We provide extensive empirical validation of the proposed solutions on DMLab-30 and further show the benefits of this setup in two training regimes for Atari: (1) a single agent is trained up until 200M environment frames per game (2) a population of agents is trained up until 200M environment frames each and may share experience. We demonstrate state-of-the-art data efficiency among model-free agents in both regimes.

Details

AAAI Conference 2019 Conference Paper

Multi-Task Deep Reinforcement Learning with PopArt

Matteo Hessel
Hubert Soyer
Lasse Espeholt
Wojciech Czarnecki
Simon Schmitt
Hado van Hasselt

The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

PDF Details