Arrow Research search

Author name cluster

Peter Dayan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

82 papers
2 author rows

Possible papers

82

JAIR Journal 2026 Journal Article

ℵ-IPOMDP: Mitigating Deception in a Cognitive Hierarchy with Off-Policy Counterfactual Anomaly Detection

  • Nitay Alon
  • Joseph M. Barnby
  • Stefan Sarkadi
  • Lion Schulz
  • Jeffrey S. Rosenschein
  • Peter Dayan

Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper recursive capabilities. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework called ℵ-IPOMDP, which augments the Bayesian inference of model-based RL agents with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize that they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and a zero-sum game. Our results demonstrate the ℵ-mechanism’s effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.

ICLR Conference 2025 Conference Paper

Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences

  • Shuchen Wu
  • Mirko Thalmann
  • Peter Dayan
  • Zeynep Akata
  • Eric Schulz

Humans excel at learning abstract patterns across different sequences, filtering out irrelevant details, and transferring these generalized concepts to new sequences. In contrast, many sequence learning models lack the ability to abstract, which leads to memory inefficiency and poor transfer. We introduce a non-parametric hierarchical variable learning model (HVM) that learns chunks from sequences and abstracts contextually similar chunks as variables. HVM efficiently organizes memory while uncovering abstractions, leading to compact sequence representations. When learning on language datasets such as babyLM, HVM learns a more efficient dictionary than standard compression algorithms such as Lempel-Ziv. In a sequence recall task requiring the acquisition and transfer of variables embedded in sequences, we demonstrate HVM’s sequence likelihood correlates with human recall times. In contrast, large language models (LLMs) struggle to transfer abstract variables as effectively as humans. From HVM’s adjustable layer of abstraction, we demonstrate that the model realizes a precise trade-off between compression and generalization. Our work offers a cognitive model that captures the learning and transfer of abstract representations in human cognition and differentiates itself from LLMs.

NeurIPS Conference 2025 Conference Paper

Concept-Guided Interpretability via Neural Chunking

  • Shuchen Wu
  • Stephan Alaniz
  • Shyamgopal Karthik
  • Peter Dayan
  • Eric Schulz
  • Zeynep Akata

Neural networks are often described as black boxes, reflecting the significant challenge of understanding their internal workings and interactions. We propose a different perspective that challenges the prevailing view: rather than being inscrutable, neural networks exhibit patterns in their raw population activity that mirror regularities in the training data. We refer to this as the \textit{Reflection Hypothesis} and provide evidence for this phenomenon in both simple recurrent neural networks (RNNs) and complex large language models (LLMs). Building on this insight, we propose to leverage cognitively-inspired methods of \textit{chunking} to segment high-dimensional neural population dynamics into interpretable units that reflect underlying concepts. We propose three methods to extract these emerging entities, complementing each other based on label availability and neural data dimensionality. Discrete sequence chunking (DSC) creates a dictionary of entities in a lower-dimensional neural space; population averaging (PA) extracts recurring entities that correspond to known labels; and unsupervised chunk discovery (UCD) can be used when labels are absent. We demonstrate the effectiveness of these methods in extracting entities across varying model sizes, ranging from inducing compositionality in RNNs to uncovering recurring neural population states in large language models with diverse architectures, and illustrate their advantage to other interpretability methods. Throughout, we observe a robust correspondence between the extracted entities and concrete or abstract concepts in the sequence. Artificially inducing the extracted entities in neural populations effectively alters the network's generation of associated concepts. Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data to reveal the hidden computations of complex learning systems, gradually transforming them from black boxes into systems we can begin to understand. Implementation and code are publicly available at https: //github. com/swu32/Chunk-Interpretability

RLDM Conference 2025 Conference Abstract

RLDM 2025 Abstract Booklet 146 Illuminating hidden state inference in mice with artificial neural networks Sebastian A. Bruijns Maria K. Eckstein

  • Illuminating hidden state inference in mice with artificial neural
  • networks
  • Sebastian A. Bruijns Maria K. Eckstein
  • Peter Dayan

Booklet 146 Illuminating hidden state inference in mice with artificial neural networks Sebastian A. Bruijns Maria K. Eckstein Max-Planck-Institute for Biological Cybernetics Google DeepMind University of Tübingen Foundational Research Unit Tübingen, Germany San Francisco, USA sebastian. bruijns@tuebingen. mpg. de Peter Dayan Max-Planck-Institute for Biological Cybernetics University of Tübingen Tübingen, Germany Abstract An impressive wealth of cognitive neuroscience tasks involve combining perceptual data with prior estimates of an informational state to make a decision. Such tasks have driven the development of theoretically-motivated cognitive models which offer compactly parameterized, and thereby insightful, accounts of the internal processes by which this might happen. However, the very large amounts of data that can now be collected present a challenge and an opportunity for this programme. The challenge is that these models can be shown to underfit the data systematically (Wilson et al. , 2019; Palminteri et al. , 2017; Nassar et al. , 2016) – particularly in how they characterize the effect of the informational state; the opportunity is that richer, more highly parameterized and flexible models, can be employed in a statistically sound, data-driven, manner (Dezfouli et al. , 2019b). Unfortunately, it is tremendously difficult to interpret these models because of their flexibility. Here, we follow a recent approach (Eckstein et al. , 2024), in which components of compact models are progressively replaced with more flexible homologues, but, where possible, the resulting insights are mapped back into theoretically-transparent forms. We prove the effectiveness of our scheme by applying it to one of the largest available decision-making data sets for mice – more than 300, 000 choices from 139 subjects studied by the International Brain Lab (The International Brain Laboratory et al. , 2021). We found widely generalizable phenomena such as notable effects of continual fluctuations in task engagement, and systematic differences between learning and forgetting of chosen and unchosen options in determining the subjective informational state. Our results show that combining theory-driven and data-driven methods can reveal latent cognitive processes that would have been difficult to discover using either method on its own.

RLDM Conference 2025 Conference Abstract

RLDM 2025 Abstract Booklet 54 A maze in action: behavioural insights into exploration Georgy Antonov Thomas Akam Max Planck Institute for Biological Cybernetics∗ Department of Experimental Psychology

  • xploration
  • Georgy Antonov Thomas Akam
  • Peter Dayan

Booklet 54 A maze in action: behavioural insights into exploration Georgy Antonov Thomas Akam Max Planck Institute for Biological Cybernetics∗ Department of Experimental Psychology Tübingen, Germany; University of Oxford Department of Experimental Psychology Oxford, UK University of Oxford Oxford, UK georgy. antonov@tue. mpg. de Peter Dayan Max Planck Institute for Biological Cybernetics; University of Tübingen Tübingen, Germany Abstract A long-standing problem in artificial intelligence is the development of efficient exploration policies. Animals, by con- trast, often seem to explore well. Thus, it is compelling to use animal exploration to help us specify potentially powerful algorithms by systematizing the processes that might govern their choices. A starting point for this systematisation is the collection of data and theories in fairly simple exploration tasks such as the multi-armed bandits; however, much less is known about exploration in more naturalistic environments. Here, we describe a novel behavioural task in which rodents have to make difficult explore-exploit decisions in navigating a complex maze to collect rewards. Our task is best described as a partially observable Markov decision process which, on the one hand, offers sustained stable periods of certainty during which the acquired information can be exploited, whilst on the other hand continually evolves, present- ing subjects with uncertainty and hence necessitating continual exploration. We report preliminary behavioural results showing that rodents indeed learn the task well and exposing a rich repertoire of exploratory behaviour.

RLC Conference 2024 Conference Paper

Exploring Uncertainty in Distributional Reinforcement Learning

  • Georgy Antonov
  • Peter Dayan

Epistemic uncertainty, which stems from what a learning algorithm does not know, is the natural signal for exploration. Capturing and exploiting epistemic uncertainty for efficient exploration is conceptually straightforward for model-based methods. However, it is computationally ruinous, prompting a search for model-free approaches. One of the most seminal and venerable such is Bayesian Q-learning, which maintains and updates an approximation to the distribution of the long run returns associated with state-action pairs. However, this approximation can be rather severe. Recent work on distributional reinforcement learning (DRL) provides many powerful methods for modelling return distributions which offer the prospect of improving upon Bayesian Q-learning's parametric scheme, but have not been fully investigated for their exploratory potential. Here, we examine the characteristics of a number of DRL algorithms in the context of exploration and propose a novel Bayesian analogue of the categorical temporal-difference algorithm. We show that this works well, converging appropriately to a close approximation to the true return distribution.

RLJ Journal 2024 Journal Article

Exploring Uncertainty in Distributional Reinforcement Learning

  • Georgy Antonov
  • Peter Dayan

Epistemic uncertainty, which stems from what a learning algorithm does not know, is the natural signal for exploration. Capturing and exploiting epistemic uncertainty for efficient exploration is conceptually straightforward for model-based methods. However, it is computationally ruinous, prompting a search for model-free approaches. One of the most seminal and venerable such is Bayesian Q-learning, which maintains and updates an approximation to the distribution of the long run returns associated with state-action pairs. However, this approximation can be rather severe. Recent work on distributional reinforcement learning (DRL) provides many powerful methods for modelling return distributions which offer the prospect of improving upon Bayesian Q-learning's parametric scheme, but have not been fully investigated for their exploratory potential. Here, we examine the characteristics of a number of DRL algorithms in the context of exploration and propose a novel Bayesian analogue of the categorical temporal-difference algorithm. We show that this works well, converging appropriately to a close approximation to the true return distribution.

NeurIPS Conference 2024 Conference Paper

Simplifying Latent Dynamics with Softly State-Invariant World Models

  • Tankred Saanum
  • Peter Dayan
  • Eric Schulz

To solve control problems via model-based reasoning or planning, an agent needs to know how its actions affect the state of the world. The actions an agent has at its disposal often change the state of the environment in systematic ways. However, existing techniques for world modelling do not guarantee that the effect of actions are represented in such systematic ways. We introduce the Parsimonious Latent Space Model (PLSM), a world model that regularizes the latent dynamics to make the effect of the agent's actions more predictable. Our approach minimizes the mutual information between latent states and the change that an action produces in the agent's latent state, in turn minimizing the dependence the state has on the dynamics. This makes the world model softly state-invariant. We combine PLSM with different model classes used for i) future latent state prediction, ii) planning, and iii) model-free reinforcement learning. We find that our regularization improves accuracy, generalization, and performance in downstream tasks, highlighting the importance of systematic treatment of actions in world models.

NeurIPS Conference 2023 Conference Paper

Reinforcement Learning with Simple Sequence Priors

  • Tankred Saanum
  • Noémi Éltető
  • Peter Dayan
  • Marcel Binz
  • Eric Schulz

In reinforcement learning (RL), simplicity is typically quantified on an action-by-action basis -- but this timescale ignores temporal regularities, like repetitions, often present in sequential strategies. We therefore propose an RL algorithm that learns to solve tasks with sequences of actions that are compressible. We explore two possible sources of simple action sequences: Sequences that can be learned by autoregressive models, and sequences that are compressible with off-the-shelf data compression algorithms. Distilling these preferences into sequence priors, we derive a novel information-theoretic objective that incentivizes agents to learn policies that maximize rewards while conforming to these priors. We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches in a series of continuous control tasks from the DeepMind Control Suite. These priors also produce a powerful information-regularized agent that is robust to noisy observations and can perform open-loop control.

ICML Conference 2022 Conference Paper

Neural Network Poisson Models for Behavioural and Neural Spike Train Data

  • Moein Khajehnejad
  • Forough Habibollahi
  • Richard Nock
  • Ehsan Arabzadeh
  • Peter Dayan
  • Amir Dezfouli

One of the most important and challenging application areas for complex machine learning methods is to predict, characterize and model rich, multi-dimensional, neural data. Recent advances in neural recording techniques have made it possible to monitor the activity of a large number of neurons across different brain regions as animals perform behavioural tasks. This poses the critical challenge of establishing links between neural activity at a microscopic scale, which might for instance represent sensory input, and at a macroscopic scale, which then generates behaviour. Predominant modeling methods apply rather disjoint techniques to these scales; by contrast, we suggest an end-to-end model which exploits recent developments of flexible, but tractable, neural network point-process models to characterize dependencies between stimuli, actions, and neural data. We apply this model to a public dataset collected using Neuropixel probes in mice performing a visually-guided behavioural task as well as a synthetic dataset produced from a hierarchical network model with reciprocally connected sensory and integration circuits intended to characterize animal behaviour in a fixed-duration motion discrimination task. We show that our model outperforms previous approaches and contributes novel insights into the relationships between neural activity and behaviour.

ICLR Conference 2021 Conference Paper

Correcting experience replay for multi-agent communication

  • Sanjeevan Ahilan
  • Peter Dayan

We consider the problem of learning to communicate using multi-agent reinforcement learning (MARL). A common approach is to learn off-policy, using data sampled from a replay buffer. However, messages received in the past may not accurately reflect the current communication policy of each agent, and this complicates learning. We therefore introduce a 'communication correction' which accounts for the non-stationarity of observed communication induced by multi-agent learning. It works by relabelling the received message to make it likely under the communicator's current policy, and thus be a better reflection of the receiver's current environment. To account for cases in which agents are both senders and receivers, we introduce an ordered relabelling scheme. Our correction is computationally efficient and can be integrated with a range of off-policy algorithms. We find in our experiments that it substantially improves the ability of communicating MARL systems to learn across a variety of cooperative and competitive tasks.

NeurIPS Conference 2021 Conference Paper

Two steps to risk sensitivity

  • Christopher Gagne
  • Peter Dayan

Distributional reinforcement learning (RL) – in which agents learn about all the possible long-term consequences of their actions, and not just the expected value – is of great recent interest. One of the most important affordances of a distributional view is facilitating a modern, measured, approach to risk when outcomes are not completely certain. By contrast, psychological and neuroscientific investigations into decision making under risk have utilized a variety of more venerable theoretical models such as prospect theory that lack axiomatically desirable properties such as coherence. Here, we consider a particularly relevant risk measure for modeling human and animal planning, called conditional value-at-risk (CVaR), which quantifies worst-case outcomes (e. g. , vehicle accidents or predation). We first adopt a conventional distributional approach to CVaR in a sequential setting and reanalyze the choices of human decision-makers in the well-known two-step task, revealing substantial risk aversion that had been lurking under stickiness and perseveration. We then consider a further critical property of risk sensitivity, namely time consistency, showing alternatives to this form of CVaR that enjoy this desirable characteristic. We use simulations to examine settings in which the various forms differ in ways that have implications for human and animal planning and behavior.

NeurIPS Conference 2020 Conference Paper

A Local Temporal Difference Code for Distributional Reinforcement Learning

  • Pablo Tano
  • Peter Dayan
  • Alexandre Pouget

Recent theoretical and experimental results suggest that the dopamine system implements distributional temporal difference backups, allowing learning of the entire distributions of the long-run values of states rather than just their expected values. However, the distributional codes explored so far rely on a complex imputation step which crucially relies on spatial non-locality: in order to compute reward prediction errors, units must know not only their own state but also the states of the other units. It is far from clear how these steps could be implemented in realistic neural circuits. Here, we introduce the Laplace code: a local temporal difference code for distributional reinforcement learning that is representationally powerful and computationally straightforward. The code decomposes value distributions and prediction errors across three separated dimensions: reward magnitude (related to distributional quantiles), temporal discounting (related to the Laplace transform of future rewards) and time horizon (related to eligibility traces). Besides lending itself to a local learning rule, the decomposition recovers the temporal evolution of the immediate reward distribution, indicating all possible rewards at all future times. This increases representational capacity and allows for temporally-flexible computations that immediately adjust to changing horizons or discount factors.

UAI Conference 2020 Conference Paper

Static and Dynamic Values of Computation in MCTS

  • Eren Sezener
  • Peter Dayan

Monte-Carlo Tree Search (MCTS) is one of the most-widely used methodsfor planning, and has powered many recent advances in artificialintelligence. In MCTS, one typically performs computations(i. e. , simulations) to collect statistics about the possible futureconsequences of actions, and then chooses accordingly. Manypopular MCTS methods such as UCT and its variants decide whichcomputations to perform by trading-off exploration and exploitation. Inthis work, we take a more direct approach, and explicitly quantify thevalue of a computation based on its expected impact on the quality ofthe action eventually chosen. Our approach goes beyond the \emph{myopic}limitations of existing computation-value-based methods in two senses: (I) we are able to account for the impact of non-immediate (ie, future)computations (II) on non-immediate actions. We show that policies thatgreedily optimize computation values are optimal under certainassumptions and obtain results that are competitive with thestate-of-the-art.

NeurIPS Conference 2019 Conference Paper

Disentangled behavioural representations

  • Amir Dezfouli
  • Hassan Ashtiani
  • Omar Ghattas
  • Richard Nock
  • Peter Dayan
  • Cheng Soon Ong

Individual characteristics in human decision-making are often quantified by fitting a parametric cognitive model to subjects' behavior and then studying differences between them in the associated parameter space. However, these models often fit behavior more poorly than recurrent neural networks (RNNs), which are more flexible and make fewer assumptions about the underlying decision-making processes. Unfortunately, the parameter and latent activity spaces of RNNs are generally high-dimensional and uninterpretable, making it hard to use them to study individual differences. Here, we show how to benefit from the flexibility of RNNs while representing individual differences in a low-dimensional and interpretable space. To achieve this, we propose a novel end-to-end learning framework in which an encoder is trained to map the behavior of subjects into a low-dimensional latent space. These low-dimensional representations are used to generate the parameters of individual RNNs corresponding to the decision-making process of each subject. We introduce terms into the loss function that ensure that the latent dimensions are informative and disentangled, i. e. , encouraged to have distinct effects on behavior. This allows them to align with separate facets of individual differences. We illustrate the performance of our framework on synthetic data as well as a dataset including the behavior of patients with psychiatric disorders.

RLDM Conference 2019 Conference Abstract

Hippocampal-midbrain circuit enhances the pleasure of anticipation in the prefrontal cortex

  • Kiyohito Iigaya
  • Tobias Hauser
  • Zeb Kurth-Nelson
  • Peter Dayan
  • Raymond J Dolan

Whether it is a pleasant dinner or a dream vacation, having something to look forward to is a keystone in building a happy life. Recent studies suggest that reward prediction errors can enhance the pleasure of anticipation. This enhanced anticipation is linked to why people seek information that cannot be acted upon, and is potentially associated with a vulnerability to addiction. However, the neural roots of the pleasure from anticipation are largely unknown. To address this issue, we studied how the brain generates and enhances anticipation, by exposing human participants to a delayed reward decision-making task while imaging their brain activities. Using a computational model of anticipation, we identified a novel anticipa- tory network consisting of three regions. We found that the ventromedial prefrontal cortex (vmPFC) tracked an anticipation signal, while dopaminergic midbrain responded to an unexpectedly good forecast. We found that hippocampus was coupled both to the vmPFC and to the dopaminergic midbrain, through the model’s computation for boosting anticipation. This result suggests that people might experience greater anticipa- tion when vividly imagining future outcomes. Thus, our findings propose a cognitive circuit for anticipatory value computation, unifying interpretations of separate notions such as risk and delay preference. Our study opens up a new avenue to understanding complex human decisions that are driven by reward anticipation, rather than well-studied reward consumption, and offers a novel intervention target for psychiatric disorders that involve motivation and future rewards.

ICML Conference 2018 Conference Paper

Fast Parametric Learning with Activation Memorization

  • Jack W. Rae
  • Chris Dyer
  • Peter Dayan
  • Timothy P. Lillicrap

Neural networks trained with backpropagation often struggle to identify classes that have been observed a small number of times. In applications where most class labels are rare, such as language modelling, this can become a performance bottleneck. One potential remedy is to augment the network with a fast-learning non-parametric model which stores recent activations and class labels into an external memory. We explore a simplified architecture where we treat a subset of the model parameters as fast memory stores. This can help retain information over longer time intervals than a traditional memory, and does not require additional space or compute. In the case of image classification, we display faster binding of novel classes on an Omniglot image curriculum task. We also show improved performance for word-based language models on news reports (GigaWord), books (Project Gutenberg) and Wikipedia articles (WikiText-103) - the latter achieving a state-of-the-art perplexity of 29. 2.

NeurIPS Conference 2018 Conference Paper

Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models

  • Amir Dezfouli
  • Richard Morris
  • Fabio Ramos
  • Peter Dayan
  • Bernard Balleine

Neuroscience studies of human decision-making abilities commonly involve subjects completing a decision-making task while BOLD signals are recorded using fMRI. Hypotheses are tested about which brain regions mediate the effect of past experience, such as rewards, on future actions. One standard approach to this is model-based fMRI data analysis, in which a model is fitted to the behavioral data, i. e. , a subject's choices, and then the neural data are parsed to find brain regions whose BOLD signals are related to the model's internal signals. However, the internal mechanics of such purely behavioral models are not constrained by the neural data, and therefore might miss or mischaracterize aspects of the brain. To address this limitation, we introduce a new method using recurrent neural network models that are flexible enough to be jointly fitted to the behavioral and neural data. We trained a model so that its internal states were suitably related to neural activity during the task, while at the same time its output predicted the next action a subject would execute. We then used the fitted model to create a novel visualization of the relationship between the activity in brain regions at different times following a reward and the choices the subject subsequently made. Finally, we validated our method using a previously published dataset. We found that the model was able to recover the underlying neural substrates that were discovered by explicit model engineering in the previous work, and also derived new results regarding the temporal pattern of brain activity.

RLDM Conference 2017 Conference Abstract

Approximate Planning from Better Bounds on Q

  • Can Eren Sezener
  • Peter Dayan

Planning problems are often solved approximately using simulation based methods such as Monte Carlo Tree Search (MCTS). Indeed, UCT, perhaps the most popular MCTS algorithm, lies at the heart of many successful applications. However, UCT is fundamentally inefficient as a planning algorithm, since it is not focused exclusively on the value of the action that is ultimately chosen. Accordingly, even as simple a modification to UCT as accounting for myopic information values at the root of the search tree can result in significant performance improvements. Here, we propose a method that extends value of information- like computations to arbitrarily many nodes of the search tree for simple acyclic MDPs. We demonstrate significant performance improvements over other planning algorithms.

RLDM Conference 2017 Conference Abstract

Forgetful Inference in a Sophisticated World Model

  • Sanjeevan Ahilan
  • Rebecca Solomon
  • Kent Conover
  • Ritwik Niyogi
  • Peter Shizgal
  • Peter Dayan

Humans and other animals are able to discover underlying statistical structure in their environ- ments and exploit it to achieve efficient and effective performance. However, the largest scale structures such as ‘world models’ are often difficult to learn and use because they are obscure, involving long-range temporal dependencies. Here, we analyzed behavioral data from a lengthy experiment with rats, showing that subjects discovered such hidden structure, using it to respond more quickly to rewarding states whilst responding more slowly, or not at all, to unrewarding states. We also identified surprising occasions where subjects responded rapidly to unrewarding states, despite the structure of the task seemingly having been learned. We attributed these instances to immediate inferential imperfections caused by the partial observ- ability of hidden states. To describe this process statistically, we built a hidden Markov model (HMM) of the subjects’ models of the experiment, describing overall behavior as integrating recent observations with the recollections of an imperfect memory. Over the course of training, we found that subjects came to track their progress through the task more accurately, indicating an improved ability to infer state. Model fits attributed this improvement to decreased forgetting of the previous state. This ‘learning to remember’ decreased reliance on more recent observations, which can be misleading, in favor of a more dependable memory.

RLDM Conference 2017 Conference Abstract

Interrupting Options: Minimizing Decision Costs via Temporal Commitment and Low-Level Interrupt*

  • Kevin Lloyd
  • Peter Dayan

Ideal decision-makers should constantly monitor all sources of external information about oppor- tunities and threats, and thus be able to redetermine their choices promptly in the face of change. However, perpetual monitoring and reassessment can impose substantial computational costs, making them imprac- tical for animals and machines alike. The obvious alternative of committing for extended periods of time to particular courses of action can be dangerous and wasteful. Here, we explore the intermediate option of making provisional temporal commitments, but engaging in limited broader observation with the possibility of interruption - effectively a form of option (Sutton et al. , Artificial Intelligence, 112, 181-211, 1999). We illustrate the issues using a simple example of foraging under predation risk, in which a decision-maker must trade off energetic gain against the danger of predation. We first show that an agent equipped with the capacity for self-interruption outperforms an agent without this capacity. Next, we observe that the optimal interruption policy is particularly uncomplicated in our example, and show that performance is essentially identical when using an approximation based on placing simple thresholds in belief space. This is consistent with the idea that a relatively simple, low-level mechanism can prompt behavioural interruption, analogous to the operation of peripherally-induced interrupts in digital computers. We interpret our results in the con- text of putative neural mechanisms, such as noradrenergic neuromodulation, and diseases of distractibility and roving attention.

RLDM Conference 2015 Conference Abstract

Cognitive biases: dissecting the influence of affect on decision-making under ambiguity in humans and animals

  • Mike Mendl
  • Elizabeth Paul
  • Samantha Jones
  • Aurelie Jolivald
  • Iain Gilchrist
  • Kiyohito Iigaya
  • Peter Dayan

There have been many battles about how best to formalise the affective states of humans and other animals in ways that can be self-evidently tied to quantifiable behaviours. One recent suggestion is that positive and negative moods can be treated as prior expectations over the future delivery of rewards and punishments, and that these priors affect behaviour through the conventional workings of Bayesian decision theory (Mendl et al. , 2010). Amongst other characteristics, this suggestion provides an inferential foundation for a task that has become a widely-used method for assessing mood states in animals (Harding et al. , 2004). This so-called ‘cognitive bias’ task extracts information about affect from the optimistic or pessimistic manner in which subjects resolve ambiguities in sensory input. Here, we describe experiments in humans and rodents aimed at elucidating further aspects of this notion. The human studies assessed the extent to which subjects can incorporate information about explicitly-imposed external loss functions into their inference about ambiguous inputs, and the way this incorporation interacts with mood. Subjects found it hard to integrate these sources of information well, which was unexpected given their apparently admirable capacities in related circumstances (Whiteley & Sahani, 2008), so we are exploring modifications. The rodent studies sought to examine the interaction between the experimenter-imposed instrumental demands of the task and inherent Pavlovian effects, such as ineluctable approach and avoidance in the face of the prospect respectively of rewards and punishment (Guitart-Masip et al. 2014). The latter might provide an account of the differences between rats and mice that we were surprised to observe.

RLDM Conference 2015 Conference Abstract

Contingency and Correlation in Reversal Learning

  • Bradley Pietras
  • Peter Dayan
  • Thomas Stalnaker
  • Geoffrey Schoenbaum
  • Tzu-Lan Yu

Reversal learning is one of the most venerable paradigms for studying the acquisition, extinction, and reacquisition of knowledge in humans and other animals. It has been of particular value in asking questions about the roles played by prefrontal structures such as the orbitofrontal cortex (OFC). Indeed, evidence from rats and monkeys suggests that these areas are involved in various forms of context-sensitive inference about the contingencies linking cues and actions over time to the value and identity of predicted outcomes. In order to explore these roles in depth, we fit data from a substantial behavioural neuroscience study in rodents who experienced blocks of free- and forced-choice instrumental learning trials with identity or value reversals at each block transition. We constructed two classes of models, fit their parameters using a random effects treatment, tested their generative competence, and selected between them based on a complexity-sensitive integrated Bayesian Information Criteria score. One class of ‘return’-based models was based on elaborations of a standard Q-learning algorithm, including parameters such as different learning rates or combination rules for forced- and fixed-choice trials, behavioural lapses, and eligibility traces. The other novel class of ‘income’-based models exploited the weak notion of contingency over time advocated by Walton et al (2010) in their analysis of the choices of monkeys with OFC lesions. We show that income- based and return-based models are both able to predict the behaviour well, and examine their performance and implications for reinforcement learning. The outcome of this study sets the stage for the next phase of the research that will attempt to correlate the values of the parameters to neural recordings taken in the rats while performing the task.

RLDM Conference 2015 Conference Abstract

Pre-response dopamine transients in the nucleus accumbens

  • Kevin Lloyd
  • Peter Dayan

The observation that the phasic activity of dopamine neurons resembles closely an appetitive temporal difference prediction error associated with conditioned sensory cues does not exhaust either the characteristics of the signal or its putative role in influencing behaviour. In particular, experiments using operant paradigms and fast timescale measurements of the concentration of dopamine in one of its key targets, the nucleus accumbens, have shown transient increases just prior to the emission of actions that deliver rewards or the avoidance of punishment. This signal might play a causal role in driving behaviour, for instance through an effect on basal ganglia dynamics. It might also be a consequence of locally gated release at the level of the striatum, without any change in phasic activity of the dopamine neurons, as has been argued for the case of dopamine ramps. However, we study a third possibility that it reflects the outcome of an internal decision to respond. This conceives of the systems controlling dopamine as monitoring the internal state of the subject, and responding when this state implies that the appetitive outcome consequent on the action is impending. We consider the implications of this view for the informational relationship between the predictive critic and a temporally-sophisticated actor. Poster Session 2, Tuesday, June 9, 2015 Starred posters will also give a plenary talk.

RLDM Conference 2015 Conference Abstract

Temporal structure in associative retrieval

  • Zeb Kurth-Nelson
  • Gareth Barnes
  • Dino Sejdi-
  • Ray Dolan
  • Peter Dayan

Electrophysiological data disclose rich dynamics in patterns of neural activity evoked by sen- sory objects. Retrieving such objects from memory reinstates components of this activity. In humans the temporal structure of this retrieved activity remains largely unexplored, and here we address this gap using the spatiotemporal precision of magnetoencephalography (MEG). In a sensory preconditioning paradigm, ‘indirect’ objects were paired with ‘direct’ objects to form associative links, and the latter were then paired with rewards. Using multivariate analysis methods we examined the short-time evolution of neural repre- sentations of indirect objects retrieved during reward-learning about direct objects. We found two separate components of the representation of the indirect stimulus appeared at distinct times during learning. The strength of retrieval of one, but not the other, representational component correlated with generalization of reward learning from direct to indirect stimuli. We suggest decomposing the temporal structure within retrieved neural representations may be key to understanding their function.

RLDM Conference 2015 Conference Abstract

The dopaminergic midbrain mediates an effect of average reward on Pavlovian vigour

  • Benjamin Chew
  • Peter Dayan
  • Ray Dolan

Phasic and tonic facets of dopamine release have been postulated as playing distinct roles in representing respectively appetitive prediction errors that mediate learning, and average rates of reward that mediate motivational vigour. However, empirical research has yet to provide evidence for the latter in a man- ner uncorrupted by influences of the former. We therefore designed a simple visual-search task in which we measured the force exerted when subjects reported the location of a target. In addition to a fixed reward for correct responses, subjects earned a performance-independent baseline monetary amount which varied across blocks. To decorrelate an influence of baseline reward from a prediction error, we provided subjects information at the start of each block regarding the amount they would receive in the subsequent block. Despite force not having any instrumental consequence, participants pressed harder for a larger baseline reward, consistent with the expression of a form of Pavlovian vigour. This larger baseline reward was asso- ciated with enhanced activity in dopamine-rich midbrain structures (ventral tegmental area/substantia nigra pars compacta; VTA/SN) to a degree that correlated across subjects with the strength of their behavioural coupling between reward and force. An opposite pattern was observed in subgenual cingulate cortex (sGC), a region involved in regulating negative emotional responses. These findings highlight a crucial role for VTA/SN and sGC in mediating an effect of average reward on tonic aspects of motivation.

NeurIPS Conference 2014 Conference Paper

Bayes-Adaptive Simulation-based Search with Value Function Approximation

  • Arthur Guez
  • Nicolas Heess
  • David Silver
  • Peter Dayan

Bayes-adaptive planning offers a principled solution to the exploration-exploitation trade-off under model uncertainty. It finds the optimal policy in belief space, which explicitly accounts for the expected effect on future rewards of reductions in uncertainty. However, the Bayes-adaptive solution is typically intractable in domains with large or continuous state spaces. We present a tractable method for approximating the Bayes-adaptive solution by combining simulation-based search with a novel value function approximation technique that generalises over belief space. Our method outperforms prior approaches in both discrete bandit tasks and simple continuous navigation and control tasks.

RLDM Conference 2013 Conference Abstract

A normative theory of approach-avoidance conflicts during dynamic foraging in humans

  • Arthur Guez
  • Ritwik Niyogi
  • Dominik Bach
  • Marc Guitart-Masip
  • Raymond Dolan
  • Peter Dayan

We propose a normative model of the behaviour of human subjects playing a dynamic foraging game containing a time-stochastic threat. The game is intended to capture the essence of the conflict between approach and avoidance. The realistic nature of the task makes planning challenging; we therefore rely on recent innovations in model-based methods to approximate the optimal policy. We observe that our optimal model captures many aspects of the behaviour, but there remain discrepancies between real and simulated data that will be used to elucidate the nature of the suboptimalities induced by the conflict. We hope to use elaborations of the model to capture the variance in the behaviour across groups of normal subjects and patients.

NeurIPS Conference 2013 Conference Paper

Correlations strike back (again): the case of associative memory retrieval

  • Cristina Savin
  • Peter Dayan
  • Mate Lengyel

It has long been recognised that statistical dependencies in neuronal activity need to be taken into account when decoding stimuli encoded in a neural population. Less studied, though equally pernicious, is the need to take account of dependencies between synaptic weights when decoding patterns previously encoded in an auto-associative memory. We show that activity-dependent learning generically produces such correlations, and failing to take them into account in the dynamics of memory retrieval leads to catastrophically poor recall. We derive optimal network dynamics for recall in the face of synaptic correlations caused by a range of synaptic plasticity rules. These dynamics involve well-studied circuit motifs, such as forms of feedback inhibition and experimentally observed dendritic nonlinearities. We therefore show how addressing the problem of synaptic correlations leads to a novel functional account of key biophysical features of the neural substrate.

RLDM Conference 2013 Conference Abstract

Neural Mechanisms of Overcoming Pavlovian Biases

  • Woo-Young Ahn
  • Peter Dayan
  • Kevin Hill
  • Terry Lohrenz
  • Read Montague

Pavlovian biases, the best known of which is the approach and engagement engendered by re- ward predictors, is well established in animals and has been related to drug-seeking behavior. However, the neural mechanisms underlying individual differences in the ability to overcome Pavlovian biases remains un- clear. To address this, we scanned 74 healthy human participants with functional magnetic resonance imag- ing while they played a pre-existing reinforcement learning task that is designed to elucidate instrumental learning and its modulation by Pavlovian biases. Via computational modeling, we found strong behavioral evidence for a Pavlovian bias in the face of rewards but not punishments, which was consistent with previ- ous reports. Using model-based fMRI, we found several regions that were important for over- coming the Pavlovian bias, including the medial prefrontal cortex (mPFC), inferior frontal gyrus (IFG), superior frontal gyrus, and hippocampus/parahippocampal gyrus. We also found that dorsolateral PFC (DLPFC), IFG, and superior temporal gyrus/medial temporal gyrus (STG/MTG) showed positive functional connectivity with mPFC while subjects successfully overcame the bias. By revealing behavioral and neural measures of indi- vidual differences in the propensity to exhibit Pavlovian biases, and a network of brain regions important for overcoming them, this work may have important implications for predicting/preventing relapse for drug addiction.

RLDM Conference 2013 Conference Abstract

The effect of reward-rescaling on risk preference

  • Peter Dayan
  • Raymond Dolan

Several findings indicate that value-based decision-making is context-dependent. However, the influence of the value range of preceding choices within a context has not been investigated. We studied this influence on risk preference, using a paradigm where people chose between risky and non-risky options with the same expected value. Crucially, the same choice could be either relatively good or bad, depending on the preceding choices presented in the context. At variance with standard economic models such as expected utility and prospect theory, participants were more risk prone with good choices. Moreover, risk preferences for the same choices in different contexts increased when these choices were relatively good rather than relatively bad. Overall, these results suggest that participants rescaled choice values depending on the value range of the preceding choices within a context. In addition, the rescaling was non-linear, thus leading to modifications in risk preference.

RLDM Conference 2013 Conference Abstract

The role of prefrontal cortex and basal ganglia in model-based and model-free reinforcement learning

  • Bruno Miranda
  • Nishantha Malalasekera
  • Peter Dayan
  • Steven Kennerley

Animals can learn to influence their environment either by exploiting stimulus-response associa- tions that have been productive in the past, or by predicting the likely worth of actions in the future based on their causal relationships with outcomes. These respectively model-free (MF) and model-based (MB) strate- gies are supported by structures including midbrain dopaminergic neurons, striatum and prefrontal cortex (PFC), but it is not clear how they interact to realize these two types of reinforcement learning (RL). We trained rhesus monkeys to perform a two-stage Markov decision task that induces a combination of MB and MF behavior. The task starts with a choice between two options. Each of these is more often associated with one of two second-stage states with probabilities that are fixed throughout the experiment. A second two-option choice is required in order to obtain one of three different levels of reward. These second-stage outcomes change independently, according to a random walk, and thus induce exploration. A descriptive analysis of our behavioral data shows that the immediate reward history (of MF and MB importance) and the interaction between reward history and the structure of the task (of MB importance) both significantly influenced stage one choices. On the other hand, only the immediate reward history seemed to influence reaction time. When we performed a trial-by-trial computational analysis on our data using different RL algorithms, we found that in the model that best fit the data, choices were made according to a weighted combination of MF-RL and MB-RL action values (with a weight for MB-RL of 84. 3±3. 2 %). Our behavioral findings support a more integrated view of MF and MB learning strategies. They also illu- minate the way that the vigor of responding relates to average rate of reward delivery. Neurophysiological recordings are currently being performed in subregions of PFC and the striatum during task performance.

RLDM Conference 2013 Conference Abstract

Towards a practical Bayes-optimal agent

  • Arthur Guez
  • David Silver
  • Peter Dayan

Only rich and sophisticated statistical models are adequate for agents that must learn to navi- gate complex environments. However, it has not been clear how methods for planning can take advantage of models, such as those incorporating Bayesian non-parametric devices, that are sufficiently intricate as to demand approximate sampling schemes. We show that Bayes-Adaptive planning can be combined in a principled way with approximate sampling, and demonstrate the power of the resulting method in a chal- lenging task involving safe exploration which defeats myopic methods such as Thompson Sampling. This highlights the importance of propagating beliefs in realistic cases involving trade-offs between exploration and exploitation. The next challenge is to employ function approximation to represent the belief-state value to improve search efficiency further and thus enable longer search horizons.

NeurIPS Conference 2012 Conference Paper

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

  • Arthur Guez
  • David Silver
  • Peter Dayan

Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems -- because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.

NeurIPS Conference 2011 Conference Paper

Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories

  • Cristina Savin
  • Peter Dayan
  • Máté Lengyel

Storing a new pattern in a palimpsest memory system comes at the cost of interfering with the memory traces of previously stored items. Knowing the age of a pattern thus becomes critical for recalling it faithfully. This implies that there should be a tight coupling between estimates of age, as a form of familiarity, and the neural dynamics of recollection, something which current theories omit. Using a normative model of autoassociative memory, we show that a dual memory system, consisting of two interacting modules for familiarity and recollection, has best performance for both recollection and recognition. This finding provides a new window onto actively contentious psychological and neural aspects of recognition memory.

NeurIPS Conference 2009 Conference Paper

Know Thy Neighbour: A Normative Theory of Synaptic Depression

  • Jean-Pascal Pfister
  • Peter Dayan
  • Máté Lengyel

Synapses exhibit an extraordinary degree of short-term malleability, with release probabilities and effective synaptic strengths changing markedly over multiple timescales. From the perspective of a fixed computational operation in a network, this seems like a most unacceptable degree of added noise. We suggest an alternative theory according to which short term synaptic plasticity plays a normatively-justifiable role. This theory starts from the commonplace observation that the spiking of a neuron is an incomplete, digital, report of the analog quantity that contains all the critical information, namely its membrane potential. We suggest that one key task for a synapse is to solve the inverse problem of estimating the pre-synaptic membrane potential from the spikes it receives and prior expectations, as in a recursive filter. We show that short-term synaptic depression has canonical dynamics which closely resemble those required for optimal estimation, and that it indeed supports high quality estimation. Under this account, the local postsynaptic potential and the level of synaptic resources track the (scaled) mean and variance of the estimated presynaptic membrane potential. We make experimentally testable predictions for how the statistics of subthreshold membrane potential fluctuations and the form of spiking non-linearity should be related to the properties of short-term plasticity in any particular cell type.

NeurIPS Conference 2009 Conference Paper

Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

  • Ruben Coen-Cagli
  • Peter Dayan
  • Odelia Schwartz

A central hypothesis about early visual processing is that it represents inputs in a coordinate system matched to the statistics of natural scenes. Simple versions of this lead to Gabor-like receptive fields and divisive gain modulation from local surrounds; these have led to influential neural and psychological models of visual processing. However, these accounts are based on an incomplete view of the visual context surrounding each point. Here, we consider an approximate model of linear and non-linear correlations between the responses of spatially distributed Gabor-like receptive fields, which, when trained on an ensemble of natural scenes, unifies a range of spatial context effects. The full model accounts for neural surround data in primary visual cortex (V1), provides a statistical foundation for perceptual phenomena associated with Lis (2002) hypothesis that V1 builds a saliency map, and fits data on the tilt illusion.

NeurIPS Conference 2008 Conference Paper

Bayesian Model of Behaviour in Economic Games

  • Debajyoti Ray
  • Brooks King-Casas
  • P. Montague
  • Peter Dayan

Classical Game Theoretic approaches that make strong rationality assumptions have difficulty modeling observed behaviour in Economic games of human subjects. We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates Game Theoretic notions of interactivity. Our generative model captures a broad class of characteristic behaviours in a multi-round Investment game. We invert the generative process for a recognition model that is used to classify 200 subjects playing an Investor-Trustee game against randomly matched opponents.

NeurIPS Conference 2008 Conference Paper

Load and Attentional Bayes

  • Peter Dayan

Selective attention is a most intensively studied psychological phenomenon, rife with theoretical suggestions and schisms. A critical idea is that of limited capacity, the allocation of which has produced half a century's worth of conflict about such phenomena as early and late selection. An influential resolution of this debate is based on the notion of perceptual load (Lavie, 2005, TICS, 9: 75), which suggests that low-load, easy tasks, because they underuse the total capacity of attention, mandatorily lead to the processing of stimuli that are irrelevant to the current attentional set; whereas high-load, difficult tasks grab all resources for themselves, leaving distractors high and dry. We argue that this theory presents a challenge to Bayesian theories of attention, and suggest an alternative, statistical, account of key supporting data.

NeurIPS Conference 2008 Conference Paper

Psychiatry: Insights into depression through normative decision-making models

  • Quentin Huys
  • Joshua Vogelstein
  • Peter Dayan

Decision making lies at the very heart of many psychiatric diseases. It is also a central theoretical concern in a wide variety of fields and has undergone detailed, in-depth, analyses. We take as an example Major Depressive Disorder (MDD), applying insights from a Bayesian reinforcement learning framework. We focus on anhedonia and helplessness. Helplessness—a core element in the conceptual- izations of MDD that has lead to major advances in its treatment, pharmacolog- ical and neurobiological understanding—is formalized as a simple prior over the outcome entropy of actions in uncertain environments. Anhedonia, which is an equally fundamental aspect of the disease, is related to the effective reward size. These formulations allow for the design of specific tasks to measure anhedonia and helplessness behaviorally. We show that these behavioral measures capture explicit, questionnaire-based cognitions. We also provide evidence that these tasks may allow classification of subjects into healthy and MDD groups based purely on a behavioural measure and avoiding any verbal reports. There are strong ties between decision making and psychiatry, with maladaptive decisions and be- haviors being very prominent in people with psychiatric disorders. Depression is classically seen as following life events such as divorces and job losses. Longitudinal studies, however, have revealed that a significant fraction of the stressors associated with depression do in fact follow MDD onset, and that they are likely due to maladaptive behaviors prominent in MDD (Kendler et al. , 1999). Clinically effective ’talking’ therapies for MDD such as cognitive and dialectical behavior therapies (DeRubeis et al. , 1999; Bortolotti et al. , 2008; Gotlib and Hammen, 2002; Power, 2005) explicitly concentrate on altering patients’ maladaptive behaviors and decision making processes. Decision making is a promising avenue into psychiatry for at least two more reasons. First, it offers powerful analytical tools. Control problems related to decision making are prevalent in a huge diversity of fields, ranging from ecology to economics, computer science and engineering. These fields have produced well-founded and thoroughly characterized frameworks within which many issues in decision making can be framed. Here, we will focus on framing issues identified in psychiatric settings within a normative decision making framework. Its second major strength comes from its relationship to neurobiology, and particularly those neuro- modulatory systems which are powerfully affected by all major clinically effective pharmacothera- pies in psychiatry. The understanding of these systems has benefited significantly from theoretical accounts of optimal control such as reinforcement learning (Montague et al. , 1996; Kapur and Rem- ington, 1996; Smith et al. , 1999; Yu and Dayan, 2005; Dayan and Yu, 2006). Such accounts may be useful to identify in more specific terms the roles of the neuromodulators in psychiatry (Smith et al. , 2004; Williams and Dayan, 2005; Moutoussis et al. , 2008; Dayan and Huys, 2008). ∗qhuys@cantab. net, joshuav@jhu. edu, dayan@gatsby. ucl. ac. uk; www. gatsby. ucl. ac. uk/∼qhuys/pub. html

NeurIPS Conference 2007 Conference Paper

Hippocampal Contributions to Control: The Third Way

  • Máté Lengyel
  • Peter Dayan

Recent experimental studies have focused on the specialization of different neural structures for different types of instrumental behavior. Recent theoretical work has provided normative accounts for why there should be more than one control system, and how the output of different controllers can be integrated. Two par- ticlar controllers have been identified, one associated with a forward model and the prefrontal cortex and a second associated with computationally simpler, habit- ual, actor-critic methods and part of the striatum. We argue here for the normative appropriateness of an additional, but so far marginalized control system, associ- ated with episodic memory, and involving the hippocampus and medial temporal cortices. We analyze in depth a class of simple environments to show that episodic control should be useful in a range of cases characterized by complexity and in- ferential noise, and most particularly at the very early stages of learning, long before habitization has set in. We interpret data on the transfer of control from the hippocampus to the striatum in the light of this hypothesis.

NeurIPS Conference 2006 Conference Paper

Uncertainty, phase and oscillatory hippocampal recall

  • Máté Lengyel
  • Peter Dayan

Many neural areas, notably, the hippocampus, show structured, dynamical, population behavior such as coordinated oscillations. It has long been observed that such oscillations provide a substrate for representing analog information in the firing phases of neurons relative to the underlying population rhythm. However, it has become increasingly clear that it is essential for neural populations to represent uncertainty about the information they capture, and the substantial recent work on neural codes for uncertainty has omitted any analysis of oscillatory systems. Here, we observe that, since neurons in an oscillatory network need not only fire once in each cycle (or even at all), uncertainty about the analog quantities each neuron represents by its firing phase might naturally be reported through the degree of concentration of the spikes that it fires. We apply this theory to memory in a model of oscillatory associative recall in hippocampal area CA3. Although it is not well treated in the literature, representing and manipulating uncertainty is fundamental to competent memory; our theory enables us to view CA3 as an effective uncertainty-aware, retrieval system.

NeurIPS Conference 2005 Conference Paper

A Bayesian Framework for Tilt Perception and Confidence

  • Odelia Schwartz
  • Peter Dayan
  • Terrence Sejnowski

The misjudgement of tilt in images lies at the heart of entertaining visual illusions and rigorous perceptual psychophysics. A wealth of findings has attracted many mechanistic models, but few clear computational principles. We adopt a Bayesian approach to perceptual tilt estimation, showing how a smoothness prior offers a powerful way of addressing much confusing data. In particular, we faithfully model recent results showing that confidence in estimation can be systematically affected by the same aspects of images that affect bias. Confidence is central to Bayesian modeling approaches, and is applicable in many other perceptual domains. Perceptual anomalies and illusions, such as the misjudgements of motion and tilt evident in so many psychophysical experiments, have intrigued researchers for decades. 13 A Bayesian view48 has been particularly influential in models of motion processing, treating such anomalies as the normative product of prior information (often statistically codifying Gestalt laws) with likelihood information from the actual scenes presented. Here, we expand the range of statistically normative accounts to tilt estimation, for which there are classes of results (on estimation confidence) that are so far not available for motion. The tilt illusion arises when the perceived tilt of a center target is misjudged (ie bias) in the presence of flankers. Another phenomenon, called Crowding, refers to a loss in the confidence (ie sensitivity) of perceived target tilt in the presence of flankers. Attempts have been made to formalize these phenomena quantitatively. Crowding has been modeled as compulsory feature pooling (ie averaging of orientations), ignoring spatial positions. 9, 10 The tilt illusion has been explained by lateral interactions11, 12 in populations of orientationtuned units; and by calibration. 13 However, most models of this form cannot explain a number of crucial aspects of the data. First, the geometry of the positional arrangement of the stimuli affects attraction versus repulsion in bias, as emphasized by Kapadia et al14 (figure 1A), and others. 15, 16 Second, Solomon et al. recently measured bias and sensitivity simultaneously. 11 The rich and surprising range of sensitivities, far from flat as a function of flanker angles (figure 1B), are outside the reach of standard models. Moreover, current explanations do not offer a computational account of tilt perception as the outcome of a normative inference process. Here, we demonstrate that a Bayesian framework for orientation estimation, with a prior favoring smoothness, can naturally explain a range of seemingly puzzling tilt data. We explicitly consider both the geometry of the stimuli, and the issue of confidence in the esti- (A) 6 5 4 3 2 1 0 -1 -2

NeurIPS Conference 2005 Conference Paper

How fast to work: Response vigor, motivation and tonic dopamine

  • Yael Niv
  • Nathaniel Daw
  • Peter Dayan

Reinforcement learning models have long promised to unify computa- tional, psychological and neural accounts of appetitively conditioned be- havior. However, the bulk of data on animal conditioning comes from free-operant experiments measuring how fast animals will work for rein- forcement. Existing reinforcement learning (RL) models are silent about these tasks, because they lack any notion of vigor. They thus fail to ad- dress the simple observation that hungrier animals will work harder for food, as well as stranger facts such as their sometimes greater produc- tivity even when working for irrelevant outcomes such as water. Here, we develop an RL framework for free-operant behavior, suggesting that subjects choose how vigorously to perform selected actions by optimally balancing the costs and benefits of quick responding. Motivational states such as hunger shift these factors, skewing the tradeoff. This accounts normatively for the effects of motivation on response rates, as well as many other classic findings. Finally, we suggest that tonic levels of dopamine may be involved in the computation linking motivational state to optimal responding, thereby explaining the complex vigor-related ef- fects of pharmacological manipulation of dopamine.

NeurIPS Conference 2005 Conference Paper

Norepinephrine and Neural Interrupts

  • Peter Dayan
  • Angela Yu

Angela J. Yu Center for Brain, Mind & Behavior Green Hall, Princeton University Princeton, NJ 08540, USA ajyu@princeton. edu Experimental data indicate that norepinephrine is critically involved in aspects of vigilance and attention. Previously, we considered the func- tion of this neuromodulatory system on a time scale of minutes and longer, and suggested that it signals global uncertainty arising from gross changes in environmental contingencies. However, norepinephrine is also known to be activated phasically by familiar stimuli in well- learned tasks. Here, we extend our uncertainty-based treatment of nore- pinephrine to this phasic mode, proposing that it is involved in the de- tection and reaction to state uncertainty within a task. This role of nore- pinephrine can be understood through the metaphor of neural interrupts.

NeurIPS Conference 2004 Conference Paper

Assignment of Multiplicative Mixtures in Natural Images

  • Odelia Schwartz
  • Terrence Sejnowski
  • Peter Dayan

In the analysis of natural images, Gaussian scale mixtures (GSM) have been used to account for the statistics of (cid: 2)lter responses, and to inspire hi- erarchical cortical representational learning schemes. GSMs pose a crit- ical assignment problem, working out which (cid: 2)lter responses were gen- erated by a common multiplicative factor. We present a new approach to solving this assignment problem through a probabilistic extension to the basic GSM, and show how to perform inference in the model using Gibbs sampling. We demonstrate the ef(cid: 2)cacy of the approach on both synthetic and image data. Understanding the statistical structure of natural images is an important goal for visual neuroscience. Neural representations in early cortical areas decompose images (and likely other sensory inputs) in a way that is sensitive to sophisticated aspects of their probabilistic structure. This structure also plays a key role in methods for image processing and coding. A striking aspect of natural images that has re(cid: 3)ections in both top-down and bottom-up modeling is coordination across nearby locations, scales, and orientations. From a top- down perspective, this structure has been modeled using what is known as a Gaussian Scale Mixture model (GSM). 1(cid: 150)3 GSMs involve a multi-dimensional Gaussian (each di- mension of which captures local structure as in a linear (cid: 2)lter), multiplied by a spatialized collection of common hidden scale variables or mixer variables(cid: 3) (which capture the coordi- nation). GSMs have wide implications in theories of cortical receptive (cid: 2)eld development, eg the comprehensive bubbles framework of Hyv¤arinen. 4 The mixer variables provide the top-down account of two bottom-up characteristics of natural image statistics, namely the ‘bowtie’ statistical dependency, 5, 6 and the fact that the marginal distributions of receptive (cid: 2)eld-like (cid: 2)lters have high kurtosis. 7, 8 In hindsight, these ideas also bear a close relation- ship with Ruderman and Bialek’s multiplicative bottom-up image analysis framework9 and statistical models for divisive gain control. 6 Coordinated structure has also been addressed in other image work, 10(cid: 150)14 and in other domains such as speech15 and (cid: 2)nance. 16 Many approaches to the unsupervised speci(cid: 2)cation of representations in early cortical areas rely on the coordinated structure. 17(cid: 150)21 The idea is to learn linear (cid: 2)lters (eg modeling simple cells as in22, 23), and then, based on the coordination, to (cid: 2)nd combinations of these (perhaps non-linearly transformed) as a way of (cid: 2)nding higher order (cid: 2)lters (eg complex cells). One critical facet whose speci(cid: 2)cation from data is not obvious is the neighborhood arrangement, ie which linear (cid: 2)lters share which mixer variables. (cid: 3)Mixer variables are also called mutlipliers, but are unrelated to the scales of a wavelet. Here, we suggest a method for (cid: 2)nding the neighborhood based on Bayesian inference of the GSM random variables. In section 1, we consider estimating these components based on information from different-sized neighborhoods and show the modes of failure when inference is too local or too global. Based on these observations, in section 2 we propose an extension to the GSM generative model, in which the mixer variables can overlap prob- abilistically. We solve the neighborhood assignment problem using Gibbs sampling, and demonstrate the technique on synthetic data. In section 3, we apply the technique to image data. 1 GSM inference of Gaussian and mixer variables In a simple, n-dimensional, version of a GSM, (cid: 2)lter responses l are synthesized y by mul- tiplying an n-dimensional Gaussian with values g = fg1: :: gng, by a common mixer variable v. (1) We assume g are uncorrelated ((cid: 27)2 along diagonal of the covariance matrix). For the ana- lytical calculations, we assume that v has a Rayleigh distribution:

NeurIPS Conference 2004 Conference Paper

Inference, Attention, and Decision in a Bayesian Neural Architecture

  • Angela Yu
  • Peter Dayan

We study the synthesis of neural coding, selective attention and percep- tual decision making. A hierarchical neural architecture is proposed, which implements Bayesian integration of noisy sensory input and top- down attentional priors, leading to sound perceptual discrimination. The model offers an explicit explanation for the experimentally observed modulation that prior information in one stimulus feature (location) can have on an independent feature (orientation). The network's intermediate levels of representation instantiate known physiological properties of vi- sual cortical neurons. The model also illustrates a possible reconciliation of cortical and neuromodulatory representations of uncertainty.

NeurIPS Conference 2004 Conference Paper

Probabilistic Computation in Spiking Populations

  • Richard Zemel
  • Rama Natarajan
  • Peter Dayan
  • Quentin Huys

As animals interact with their environments, they must constantly update estimates about their states. Bayesian models combine prior probabil- ities, a dynamical model and sensory evidence to update estimates op- timally. These models are consistent with the results of many diverse psychophysical studies. However, little is known about the neural rep- resentation and manipulation of such Bayesian information, particularly in populations of spiking neurons. We consider this issue, suggesting a model based on standard neural architecture and activations. We illus- trate the approach on a simple random walk example, and apply it to a sensorimotor integration task that provides a particularly compelling example of dynamic probabilistic computation. Bayesian models have been used to explain a gamut of experimental results in tasks which require estimates to be derived from multiple sensory cues. These include a wide range of psychophysical studies of perception; 13 motor action; 7 and decision-making. 3, 5 Central to Bayesian inference is that computations are sensitive to uncertainties about afferent and efferent quantities, arising from ignorance, noise, or inherent ambiguity (e. g. , the aperture problem), and that these uncertainties change over time as information accumulates and dissipates. Understanding how neurons represent and manipulate uncertain quantities is therefore key to understanding the neural instantiation of these Bayesian inferences. Most previous work on representing probabilistic inference in neural populations has fo- cused on the representation of static information. 1, 12, 15 These encompass various strategies for encoding and decoding uncertain quantities, but do not readily generalize to real-world dynamic information processing tasks, particularly the most interesting cases with stim- uli changing over the same timescale as spiking itself. 11 Notable exceptions are the re- cent, seminal, but, as we argue, representationally restricted, models proposed by Gold and Shadlen, 5 Rao, 10 and Deneve. 4 In this paper, we first show how probabilistic information varying over time can be repre- sented in a spiking population code. Second, we present a method for producing spiking codes that facilitate further processing of the probabilistic information. Finally, we show the utility of this method by applying it to a temporal sensorimotor integration task. 1 TRAJECTORY ENCODING AND DECODING We assume that population spikes R(t) arise stochastically in relation to the trajectory X(t) of an underlying (but hidden) variable. We use RT and XT for the whole trajectory and spike trains respectively from time 0 to T. The spikes RT constitute the observations and are assumed to be probabilistically related to the signal by a tuning function f (X, i): P (R(i, T )|X(T )) f (X, i) (1) for the spike train of the ith neuron, with parameters i. Therefore, via standard Bayesian inference, RT determines a distribution over the hidden variable at time T, P (X(T )|RT ). We first consider a version of the dynamics and input coding that permits an analytical examination of the impact of spikes. Let X(t) follow a stationary Gaussian process such that the joint distribution P (X(t1), X(t2), .. ., X(tm)) is Gaussian for any finite collection of times, with a covariance matrix which depends on time differences: Ctt = c(|t - t |). Function c(|t|) controls the smoothness of the resulting random walks. Then, P (X(T )|RT ) p(X(T )) dX(T )P (R X(T ) T |X(T ))P (X(T )|X (T )) (2) where P (X(T )|X(T )) is the distribution over the whole trajectory X(T ) conditional on the value of X(T ) at its end point. If RT are a set of conditionally independent inhomoge- neous Poisson processes, we have P (RT |X(T )) f (X(t d f (X( ), i i ), i) exp - i i), (3) where ti are the spike times of neuron i in RT. Let = [X(ti )] be the vector of stimulus positions at the times at which we observed a spike and = [(ti )] be the vector of spike positions. If the tuning functions are Gaussian f (X, i) exp(-(X - i)2/22) and sufficiently dense that d f (X, i i) is independent of X (a standard assumption in population coding), then P (RT |X(T )) exp(- - 2/22) and in Equation 2, we can marginalize out X(T ) except at the spike times ti: P (X(T )|RT ) p(X(T )) d exp -[, X(T )]T C-1 [, X(T )] - - 2 (4) 2 22 and C is the block covariance matrix between X(ti ), x(T ) at the spike times [tt ] and the final time T. This Gaussian integral has P (X(T )|RT ) N ((T ), (T )), with (T ) = CT t(Ctt + I2)-1 = k (T ) = CT T - kCtT (5) CT T is the T, T th element of the covariance matrix and CT t is similarly a row vector. The dependence in on past spike times is specified chiefly by the inverse covariance matrix, and acts as an effective kernel (k). This kernel is not stationary, since it depends on factors such as the local density of spiking in the spike train RT. For example, consider where X(t) evolves according to a diffusion process with drift: dX = -Xdt + dN (t) (6) where prevents it from wandering too far, N (t) is white Gaussian noise with mean zero and 2 variance. Figure 1A shows sample kernels for this process. Inspection of Figure 1A reveals some important traits. First, the monotonically decreasing kernel magnitude as the time span between the spike and the current time T grows matches the intuition that recent spikes play a more significant role in determining the posterior over X(T ). Second, the kernel is nearly exponential, with a time constant that depends on the time constant of the covariance function and the density of the spikes; two settings of these parameters produced the two groupings of kernels in the figure. Finally, the fully adaptive kernel k can be locally well approximated by a metronomic kernel k (shown in red in Figure 1A) that assumes regular spiking. This takes advantage of the general fact, indicated by the grouping of kernels, that the kernel depends weakly on the actual spike pattern, but strongly on the average rate. The merits of the metronomic kernel are that it is stationary and only depends on a single mean rate rather than the full spike train RT. It also justifies Kernels k and ks Variance ratio Full kernel A B D -0. 5 -2 10 10 2 / 0 -4 2 5 Space 10 Kernel size (weight) 0 0. 5 0 0. 03 0. 06 0. 09 0. 04 0. 06 0. 08 0. 1 t-tspike Time C True stimulus and means Regular, stationary kernel E 0. 5 -0. 5 0 0 Space Space -0. 5 0. 5 0. 03 0. 04 0. 05 0. 06 0. 07 0. 08 0. 09 0. 1 0. 03 0. 04 0. 05 0. 06 0. 07 0. 08 0. 09 0. 1 Time Time Figure 1: Exact and approximate spike decoding with the Gaussian process prior. Spikes are shown in yellow, the true stimulus in green, and P (X(T )|RT ) in gray. Blue: exact inference with nonstationary and red: approximate inference with regular spiking. A Ker- nel samples for a diffusion process as defined by equations 5, 6. B, C: Mean and variance of the inference. D: Exact inference with full kernel k and E: approximation based on metronomic kernel k. (Equation 7). the form of decoder used for the network model in the next section. 6 Figure 1D shows an example of how well Equation 5 specifies a distribution over X(t) through very few spikes. Finally, 1E shows a factorized approximation with the stationary kernel similar to that used by Hinton and Brown6 and in our recurrent network: ^ t P (X(t)|R(t)) f (X, kst j=0 j ij = exp(-E(X(t), R(t), t)), (7) i i) By design, the mean is captured very well, but not the variance, which in this example grows too rapidly for long interspike intervals (Figure 1B, C). Using a slower kernel im- proves performance on the variance, but at the expense of the mean. We thus turn to the net- work model with recurrent connections that are available to reinstate the spike-conditional characteristics of the full kernel. 2 NETWORK MODEL FORMULATION Above we considered how population spikes RT specify a distribution over X(T ). We now extend this to consider how interconnected populations of neurons can specify distributions over time-varying variables. We frame the problem and our approach in terms of a two-level network, connecting one population of neurons to another; this construction is intended to apply to any level of processing. The network maps input population spikes R(t) to output population spikes S(t), where input and output evolve over time. As with the input spikes, ST indicates the output spike trains from time 0 to T, and these output spikes are assumed to determine a distribution over a related hidden variable. For the recurrent and feedforward computation in the network, we start with the de- ceptively simple goal9 of producing output spikes in such a way that the distribution Q(X(T )|ST ) they imply over the same hidden variable X(T ) as the input, faithfully matches P (X(T )|RT ). This might seem a strange goal, since one could surely just lis- ten to the input spikes. However, in order for the output spikes to track the hidden variable, the dynamics of the interactions between the neurons must explicitly capture the dynamics of the process X(T ). Once this `identity mapping' problem has been solved, more general, complex computations can be performed with ease. We illustrate this on a multisensory integration task, tracking a hidden variable that depends on multiple sensory cues. The aim of the recurrent network is to take the spikes R(t) as inputs, and produce output spikes that capture the probabilistic dynamics. We proceed in two steps. We first consider the probabilistic decoding process which turns ST into Q(X(t)|ST ). Then we discuss the recurrent and feedforward processing that produce appropriate ST given this decoder. Note that this decoding process is not required for the network processing; it instead provides a computational objective for the spiking dynamics in the system. We use a simple log-linear decoder based on a spatiotemporal kernel: 6 Q(X(T )|ST ) exp(-E(X(T ), ST, T )), where (8) E(X, S T T, T ) = S(j, T - ) j =0 j (X, ) (9) is an energy function, and the spatiotemporal kernels are assumed separable: j(X, ) = gj(X)( ). The spatial kernel gj(X) is related to the receptive field f (X, j) of neuron j and the temporal kernel j(X, ) to k. The dynamics of processing in the network follows a standard recurrent neural architecture for modeling cortical responses, in the case that network inputs R(t) and outputs S(t) are spikes. The effect of a spike on other neurons in the network is assumed to have some simple temporal dynamics, described here again by the temporal kernel ( ): ri(t) = T R(i, T - )( ) s S(j, T - )( ) =0 j (t) = T =0 where T is the extent of the kernel. The response of an output neuron is governed by a stochastic spiking rule, where the probability that neuron j spikes at time t is given by: P (S(j, t) = 1) = (uj(t)) = ( w v i ij ri(t) + k kj sk (t - 1)) (10) where () is the logistic function, and W and V are the feedforward and recurrent weights. If ( ) = exp(- ), then uj(T ) = (0)(Wj R(T ) + Vj S(T )) + (1)uj(T - 1); this corresponds to a discretization of the standard dynamics for the membrane potential of a leaky integrate-and-fire neuron: duj = -u dt j +WR+VS, where the leak is determined by the temporal kernel. The task of the network is to make Q(X(T )|ST ) of Equation 8 match P (X(T )|RT ) com- ing from one of the two models above (exact dynamic or approximate stationary kernel). We measure the discrepancy using the Kullback-Leibler (KL) divergence: J = KL [P (X(T )|R t T )||Q(X (T )|ST )] (11) and, as a proof of principle in the experiments below, find optimal W and V by minimizing the KL divergence J using back-propagation through time (BPTT). In or- der to implement this in the most straightforward way, we convert the stochastic spik- ing rule (Equation 10) to a deterministic rule via the mean-field assumption: Sj(t) = ( w v i ij ri(t) + k kj sk (t - 1)). The gradients are tedious, but can be neatly ex- pressed in a temporally recursive form. Note that our current focus in the system is on the representational capability of the system, rather than its learning. Our results establish that the system can faithfully represent the posterior distribution. We return to the issue of more plausible learning rules below. The resulting network can be seen as a dynamic spiking analogue of the recurrent network scheme of Pouget et al. :9 both methods formulate feedforward and recurrent connections so that a simple decoding of the output can match optimal but complex decoding applied to the inputs. A further advantage of the scheme proposed here is that it facilitates downstream processing of the probabilistic information, as the objective encourages the formation of distributions at the output that factorize across the units. 3 RELATED MODELS Ideas about the representation of probabilistic information in spiking neurons are in vogue. One treatment considers Poisson spiking in populations with regular tuning functions, as- suming that stimuli change slowly compared with the inter-spike intervals. 8 This leads to a Kalman filter account with much formal similarity to the models of P (X(T )|RT ). However, because of the slow timescale, recurrent dynamics can be allowed to settle to an underlying attractor. In another approach, the spiking activity of either a single neuron4 or a pair of neurons5 is considered as reporting (logarithmic) probabilistic information about an underlying binary hypothesis. A third treatment proposes that a population of neurons directly represents the (logarithmic) probability over the state of a hidden Markov model. 10 Our method is closely related to the latter two models. Like Deneve's4 we consider the transformation of input spikes to output spikes with a fixed assumed decoding scheme so that the dynamics of an underlying process is captured. Our decoding mechanism produces something like the predictive coding apparent in Deneve's scheme, except that here, a neu- ron may not need to spike not only if it itself has recently spiked and thereby conveyed the appropriate information; but also if one of its population neighbors has recently spiked. This is explicitly captured by the recurrent interactions among the population. Our scheme also resembles Rao's10 approach in that it involves population codes. Our representational scheme is more general, however, in that the spatiotemporal decoder defines the relation- ship between output spikes and Q(X(T )|ST ), whereas his method assumes a direct en- coding, with each output neuron's activity proportional to log Q(X(T )|ST ). Our decoder can produce such a direct encoding if the spatial and temporal kernels are delta functions, but other kernels permit coordination amongst the population to take into account temporal effects, and to produce higher fidelity in the output distribution.

NeurIPS Conference 2004 Conference Paper

Rate- and Phase-coded Autoassociative Memory

  • Máté Lengyel
  • Peter Dayan

Areas of the brain involved in various forms of memory exhibit patterns of neural activity quite unlike those in canonical computational models. We show how to use well-founded Bayesian probabilistic autoassociative recall to derive biologically reasonable neuronal dynamics in recurrently coupled models, together with appropriate values for parameters such as the membrane time constant and inhibition. We explicitly treat two cases. One arises from a standard Hebbian learning rule, and involves activity patterns that are coded by graded firing rates. The other arises from a spike timing dependent learning rule, and involves patterns coded by the phase of spike times relative to a coherent local field potential oscillation. Our model offers a new and more complete understanding of how neural dynamics may support autoassociation.

NeurIPS Conference 2003 Conference Paper

Dopamine Modulation in a Basal Ganglio-Cortical Network of Working Memory

  • Aaron Gruber
  • Peter Dayan
  • Boris Gutkin
  • Sara Solla

Dopamine exerts two classes of effect on the sustained neural activity in prefrontal cortex that underlies working memory. Direct release in the cortex increases the contrast of prefrontal neurons, enhancing the ro- bustness of storage. Release of dopamine in the striatum is associated with salient stimuli and makes medium spiny neurons bistable; this mod- ulation of the output of spiny neurons affects prefrontal cortex so as to indirectly gate access to working memory and additionally damp sensi- tivity to noise. Existing models have treated dopamine in one or other structure, or have addressed basal ganglia gating of working memory ex- clusive of dopamine effects. In this paper we combine these mechanisms and explore their joint effect. We model a memory-guided saccade task to illustrate how dopamine’s actions lead to working memory that is se- lective for salient input and has increased robustness to distraction.

NeurIPS Conference 2003 Conference Paper

Plasticity Kernels and Temporal Statistics

  • Peter Dayan
  • Michael Häusser
  • Michael London

Computational mysteries surround the kernels relating the magnitude and sign of changes in efficacy as a function of the time difference between pre- and post-synaptic activity at a synapse. One important idea34 is that kernels result from fil(cid: 173) tering, ie an attempt by synapses to eliminate noise corrupting learning. This idea has hitherto been applied to trace learning rules; we apply it to experimentally-defined kernels, using it to reverse-engineer assumed signal statistics. We also extend it to consider the additional goal for filtering of weighting learning according to statistical surprise, as in the Z-score transform. This provides a fresh view of observed kernels and can lead to different, and more natural, signal statistics.

NeurIPS Conference 2002 Conference Paper

Adaptation and Unsupervised Learning

  • Peter Dayan
  • Maneesh Sahani
  • Gregoire Deback

Adaptation is a ubiquitous neural and psychological phenomenon, with a wealth of instantiations and implications. Although a basic form of plasticity, it has, bar some notable exceptions, attracted computational theory of only one main variety. In this paper, we study adaptation from the perspective of factor analysis, a paradigmatic technique of unsuper- vised learning. We use factor analysis to re-interpret a standard view of adaptation, and apply our new model to some recent data on adaptation in the domain of face discrimination.

NeurIPS Conference 2002 Conference Paper

Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

  • Peter Dayan
  • Angela Yu

Inference and adaptation in noisy and changing, rich sensory environ- ments are rife with a variety of specific sorts of variability. Experimental and theoretical studies suggest that these different forms of variability play different behavioral, neural and computational roles, and may be reported by different (notably neuromodulatory) systems. Here, we re- fine our previous theory of acetylcholine’s role in cortical inference in the (oxymoronic) terms of expected uncertainty, and advocate a theory for norepinephrine in terms of unexpected uncertainty. We suggest that norepinephrine reports the radical divergence of bottom-up inputs from prevailing top-down interpretations, to influence inference and plasticity. We illustrate this proposal using an adaptive factor analysis model.

NeurIPS Conference 2002 Conference Paper

Replay, Repair and Consolidation

  • Szabolcs Káli
  • Peter Dayan

A standard view of memory consolidation is that episodes are stored tem- porarily in the hippocampus, and are transferred to the neocortex through replay. Various recent experimental challenges to the idea of transfer, particularly for human memory, are forcing its re-evaluation. However, although there is independent neurophysiological evidence for replay, short of transfer, there are few theoretical ideas for what it might be doing. We suggest and demonstrate two important computational roles associated with neocortical indices.

NeurIPS Conference 2001 Conference Paper

ACh, Uncertainty, and Cortical Inference

  • Peter Dayan
  • Angela Yu

Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, in(cid: 3)uenced by input stimuli, and top-down inference, in(cid: 3)uenced by contextual information. We illustrate our proposal using a hierarchical hid- den Markov model.

NeurIPS Conference 2001 Conference Paper

Motivated Reinforcement Learning

  • Peter Dayan

The standard reinforcement learning view of the involvement of neuromodulatory systems in instrumental conditioning in(cid: 173) cludes a rather straightforward conception of motivation as prediction of sum future reward. Competition between actions is based on the motivating characteristics of their consequent states in this sense. Substantial, careful, experiments reviewed in Dickinson & Balleine, 12, 13 into the neurobiology and psychol(cid: 173) ogy of motivation shows that this view is incomplete. In many cases, animals are faced with the choice not between many dif(cid: 173) ferent actions at a given state, but rather whether a single re(cid: 173) sponse is worth executing at all. Evidence suggests that the motivational process underlying this choice has different psy(cid: 173) chological and neural properties from that underlying action choice. We describe and model these motivational systems, and consider the way they interact.

NeurIPS Conference 2000 Conference Paper

Competition and Arbors in Ocular Dominance

  • Peter Dayan

Hebbian and competitive Hebbian algorithms are almost ubiquitous in modeling pattern formation in cortical development. We analyse in the(cid: 173) oretical detail a particular model (adapted from Piepenbrock & Ober(cid: 173) mayer, 1999) for the development of Id stripe-like patterns, which places competitive and interactive cortical influences, and free and restricted ini(cid: 173) tial arborisation onto a common footing.

NeurIPS Conference 2000 Conference Paper

Dopamine Bonuses

  • Sham Kakade
  • Peter Dayan

Substantial data support a temporal difference (TO) model of dopamine (OA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain cir(cid: 173) cumstances, OA activity seems anomalous under the TO model, responding to non-rewarding stimuli. We address these anoma(cid: 173) lies by suggesting that OA cells multiplex information about re(cid: 173) ward bonuses, including Sutton's exploration bonuses and Ng et al's non-distorting shaping bonuses. We interpret this additional role for OA in terms of the unconditional attentional and psy(cid: 173) chomotor effects of dopamine, having the computational role of guiding exploration.

NeurIPS Conference 2000 Conference Paper

Explaining Away in Weight Space

  • Peter Dayan
  • Sham Kakade

Explaining away has mostly been considered in terms of inference of states in belief networks. We show how it can also arise in a Bayesian context in inference about the weights governing relationships such as those between stimuli and reinforcers in conditioning experiments such as bacA, 'Ward blocking. We show how explaining away in weight space can be accounted for using an extension of a Kalman filter model; pro(cid: 173) vide a new approximate way of looking at the Kalman gain matrix as a whitener for the correlation matrix of the observation process; suggest a network implementation of this whitener using an architecture due to Goodall; and show that the resulting model exhibits backward blocking.

NeurIPS Conference 2000 Conference Paper

Hippocampally-Dependent Consolidation in a Hierarchical Model of Neocortex

  • Szabolcs Káli
  • Peter Dayan

In memory consolidation, declarative memories which initially require the hippocampus for their recall, ultimately become independent of it. Consolidation has been the focus of numerous experimental and qualita(cid: 173) tive modeling studies, but only little quantitative exploration. We present a consolidation model in which hierarchical connections in the cortex, that initially instantiate purely semantic information acquired through probabilistic unsupervised learning, come to instantiate episodic infor(cid: 173) mation as well. The hippocampus is responsible for helping complete partial input patterns before consolidation is complete, while also train(cid: 173) ing the cortex to perform appropriate completion by itself.

NeurIPS Conference 2000 Conference Paper

Position Variance, Recurrence and Perceptual Learning

  • Zhaoping Li
  • Peter Dayan

Stimulus arrays are inevitably presented at different positions on the retina in visual tasks, even those that nominally require fixation. In par(cid: 173) ticular, this applies to many perceptual learning tasks. We show that per(cid: 173) ceptual inference or discrimination in the face of positional variance has a structurally different quality from inference about fixed position stimuli, involving a particular, quadratic, non-linearity rather than a purely lin(cid: 173) ear discrimination. We show the advantage taking this non-linearity into account has for discrimination, and suggest it as a role for recurrent con(cid: 173) nections in area VI, by demonstrating the superior discrimination perfor(cid: 173) mance of a recurrent network. We propose that learning the feedforward and recurrent neural connections for these tasks corresponds to the fast and slow components of learning observed in perceptual learning tasks.

NeurIPS Conference 1999 Conference Paper

Acquisition in Autoshaping

  • Sham Kakade
  • Peter Dayan

Quantitative data on the speed with which animals acquire behav(cid: 173) ioral responses during classical conditioning experiments should provide strong constraints on models of learning. However, most models have simply ignored these data; the few that have attempt(cid: 173) ed to address them have failed by at least an order of magnitude. We discuss key data on the speed of acquisition, and show how to account for them using a statistically sound model of learning, in which differential reliabilities of stimuli playa crucial role.

NeurIPS Conference 1998 Conference Paper

Computational Differences between Asymmetrical and Symmetrical Networks

  • Zhaoping Li
  • Peter Dayan

Symmetrically connected recurrent networks have recently been used as models of a host of neural computations. However, be(cid: 173) cause of the separation between excitation and inhibition, biolog(cid: 173) ical neural networks are asymmetrical. We study characteristic differences between asymmetrical networks and their symmetri(cid: 173) cal counterparts, showing that they have dramatically different dynamical behavior and also how the differences can be exploited for computational ends. We illustrate our results in the case of a network that is a selective amplifier.

NeurIPS Conference 1998 Conference Paper

Distributional Population Codes and Multiple Motion Models

  • Richard Zemel
  • Peter Dayan

Most theoretical and empirical studies of population codes make the assumption that underlying neuronal activities is a unique and unambiguous value of an encoded quantity. However, population activities can contain additional information about such things as multiple values of or uncertainty about the quantity. We have pre(cid: 173) viously suggested a method to recover extra information by treat(cid: 173) ing the activities of the population of cells as coding for a com(cid: 173) plete distribution over the coded quantity rather than just a single value. We now show how this approach bears on psychophys(cid: 173) ical and neurophysiological studies of population codes for mo(cid: 173) tion direction in tasks involving transparent motion stimuli. We show that, unlike standard approaches, it is able to recover mul(cid: 173) tiple motions from population responses, and also that its output is consistent with both correct and erroneous human performance on psychophysical tasks. A population code can be defined as a set of units whose activities collectively encode some underlying variable (or variables). The standard view is that popu(cid: 173) lation codes are useful for accurately encoding the underlying variable when the individual units are noisy. Current statistical approaches to interpreting popula(cid: 173) tion activity reflect this view, in that they determine the optimal single value that explains the observed activity pattern given a particular model of the noise (and possibly a loss function). In our work, we have pursued an alternative hypothesis, that the population en(cid: 173) codes additional information about the underlying variable, including multiple values and uncertainty. The Distributional Population Coding (DPC) framework finds the best probability distribution across values that fits the population activity (Zemel, Dayan, & Pouget, 1998). The DPC framework is appealing since it makes clear how extra information can be conveyed in a population code. In this paper, we use it to address a particu- Distributional Population Codes and Multiple Motion Models

NeurIPS Conference 1997 Conference Paper

Hippocampal Model of Rat Spatial Abilities Using Temporal Difference Learning

  • David Foster
  • Richard Morris
  • Peter Dayan

We provide a model of the standard watermaze task, and of a more challenging task involving novel platform locations, in which rats exhibit one-trial learning after a few days of training. The model uses hippocampal place cells to support reinforcement learning, and also, in an integrated manner, to build and use allocentric coordinates.

NeurIPS Conference 1997 Conference Paper

Statistical Models of Conditioning

  • Peter Dayan
  • Theresa Long

Conditioning experiments probe the ways that animals make pre(cid: 173) dictions about rewards and punishments and use those predic(cid: 173) tions to control their behavior. One standard model of condition(cid: 173) ing paradigms which involve many conditioned stimuli suggests that individual predictions should be added together. Various key results show that this model fails in some circumstances, and mo(cid: 173) tivate an alternative model, in which there is attentional selection between different available stimuli. The new model is a form of mixture of experts, has a close relationship with some other exist(cid: 173) ing psychological suggestions, and is statistically well-founded.

NeurIPS Conference 1996 Conference Paper

A Hierarchical Model of Visual Rivalry

  • Peter Dayan

Binocular rivalry is the alternating percept that can result when the two eyes see different scenes. Recent psychophysical evidence supports an account for one component of binocular rivalry similar to that for other bistable percepts. We test the hypothesisl9, 16, 18 that alternation can be generated by competition between top(cid: 173) down cortical explanations for the inputs, rather than by direct competition between the inputs. Recent neurophysiological ev(cid: 173) idence shows that some binocular neurons are modulated with the changing percept; others are not, even if they are selective be(cid: 173) tween the stimuli presented to the eyes. We extend our model to a hierarchy to address these effects.

NeurIPS Conference 1996 Conference Paper

Analytical Mean Squared Error Curves in Temporal Difference Learning

  • Satinder Singh
  • Peter Dayan

We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal difference value estimation algorithms change with offline updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its step(cid: 173) size and eligibility trace parameters.

NeurIPS Conference 1996 Conference Paper

Neural Models for Part-Whole Hierarchies

  • Maximilian Riesenhuber
  • Peter Dayan

We present a connectionist method for representing images that ex(cid: 173) plicitly addresses their hierarchical nature. It blends data from neu(cid: 173) roscience about whole-object viewpoint sensitive cells in inferotem(cid: 173) poral cortex8 and attentional basis-field modulation in V43 with ideas about hierarchical descriptions based on microfeatures. 5, 11 The resulting model makes critical use of bottom-up and top-down pathways for analysis and synthesis. 6 We illustrate the model with a simple example of representing information about faces. 1 Hierarchical Models Images of objects constitute an important paradigm case of a representational hi(cid: 173) erarchy, in which 'wholes', such as faces, consist of 'parts', such as eyes, noses and mouths. The representation and manipulation of part-whole hierarchical informa(cid: 173) tion in fixed hardware is a heavy millstone around connectionist necks, and has consequently been the inspiration for many interesting proposals, such as Pollack's RAAM. l1 We turned to the primate visual system for clues. Anterior inferotemporal cortex (IT) appears to construct representations of visually presented objects. Mouths and faces are both objects, and so require fully elaborated representations, presumably at the level of anterior IT, probably using different (or possibly partially overlap(cid: 173) ping) sets of cells. The natural way to represent the part-whole relationship between mouths and faces is to have a neuronal hierarchy, with connections bottom-up from the mouth units to the face units so that information about the mouth can be used to help recognize or analyze the image of a face, and connections top-down from the face units to the mouth units expressing the generative or synthetic knowledge that if there is a face in a scene, then there is (usually) a mouth too. There is little We thank Larry Abbott, Geoff Hinton, Bruno Olshausen, Tomaso Poggio, Alex Pouget, Emilio Salinas and Pawan Sinha for discussions and comments. 18 M. Riesenhuberand P. Dayan empirical support for or against such a neuronal hierarchy, but it seems extremely unlikely on the grounds that arranging for one with the correct set of levels for all classes of objects seems to be impossible. There is recent evidence that activities of cells in intermediate areas in the visual processing hierarchy (such as V4) are influenced by the locus of visual attention. 3 This suggests an alternative strategy for representing part-whole information, in which there is an interaction, subject to attentional control, between top-down generative and bottom-up recognition processing. In one version of our example, activating units in IT that represent a particular face leads, through the top-down generative model, to a pattern of activity in lower areas that is closely related to the pattern of activity that would be seen when the entire face is viewed. This activation in the lower areas in turn provides bottom-up input to the recognition system. In the bottom-up direction, the attentional signal controls which aspects of that activation are actually processed, for example, specifying that only the activity reflecting the lower part of the face should be recognized. In this case, the mouth units in IT can then recognize this restricted pattern of activity as being a particular sort of mouth. Therefore, we have provided a way by which the visual system can represent the part-whole relationship between faces and mouths. This describes just one of many possibilities. For instance, attentional control could be mainly active during the top-down phase instead. Then it would create in VI (or indeed in intermediate areas) just the activity corresponding to the lower portion of the face in the first place. Also the focus of attention need not be so ineluctably spatial. The overall scheme is based on an hierarchical top-down synthesis and bottom-up analysis model for visual processing, as in the Helmholtz machine6 (note that "hi(cid: 173) erarchy" here refers to a processing hierarchy rather than the part-whole hierarchy discussed above) with a synthetic model forming the effective map: 'object' 18) 'attentional eye-position' -t 'image' (1) (shown in cartoon form in figure 1) where 'image' stands in for the (probabilities over the) activities of units at various levels in the system that would be caused by seeing the aspect of the 'object' selected by placing the focus and scale of attention appropriately. We use this generative model during synthesis in the way described above to traverse the hierarchical description of any particular image. We use the statistical inverse of the synthetic model as the way of analyzing images to determine what objects they depict. This inversion process is clearly also sensitive to the attentional eye-position - it actually determines not only the nature of the object in the scene, but also the way that it is depicted (ie its instantiation parameters) as reflected in the attentional eye position. In particular, the bottom-up analysis model exists in the connections leading to the 2D viewpoint-selective image cells in IT reported by Logothetis et al8 which form population codes for all the represented images (mouths, noses, etc). The top-down synthesis model exists in the connections leading in the reverse direction. In generalizations of our scheme, it may, of course, not be necessary to generate an image all the way down in VI. The map (1) specifies a top-down computational task very like the bottom-up one addressed using a multiplicatively controlled synaptic matrix in the shifter model Neural Modelsfor Part-Whole Hierarchies 19 attentional eye position e =(c; , tyl t%)

NeurIPS Conference 1996 Conference Paper

Probabilistic Interpretation of Population Codes

  • Richard Zemel
  • Peter Dayan
  • Alexandre Pouget

We present a theoretical framework for population codes which generalizes naturally to the important case where the population provides information about a whole probability distribution over an underlying quantity rather than just a single value. We use the framework to analyze two existing models, and to suggest and evaluate a third model for encoding such probability distributions.

NeurIPS Conference 1995 Conference Paper

Does the Wake-sleep Algorithm Produce Good Density Estimators?

  • Brendan Frey
  • Geoffrey Hinton
  • Peter Dayan

The wake-sleep algorithm (Hinton, Dayan, Frey and Neal 1995) is a rel(cid: 173) atively efficient method of fitting a multilayer stochastic generative model to high-dimensional data. In addition to the top-down connec(cid: 173) tions in the generative model, it makes use of bottom-up connections for approximating the probability distribution over the hidden units given the data, and it trains these bottom-up connections using a simple delta rule. We use a variety of synthetic and real data sets to compare the per(cid: 173) formance of the wake-sleep algorithm with Monte Carlo and mean field methods for fitting the same generative model and also compare it with other models that are less powerful but easier to fit.

NeurIPS Conference 1995 Conference Paper

Improving Policies without Measuring Merits

  • Peter Dayan
  • Satinder Singh

Performing policy iteration in dynamic programming should only require knowledge of relative rather than absolute measures of the utility of actions (Werbos, 1991) - what Baird (1993) calls the ad(cid: 173) vantages of actions at states. Nevertheless, most existing methods in dynamic programming (including Baird's) compute some form of absolute utility function. For smooth problems, advantages satisfy two differential consistency conditions (including the requirement that they be free of curl), and we show that enforcing these can lead to appropriate policy improvement solely in terms of advantages. 1

NeurIPS Conference 1994 Conference Paper

Recognizing Handwritten Digits Using Mixtures of Linear Models

  • Geoffrey Hinton
  • Michael Revow
  • Peter Dayan

We construct a mixture of locally linear generative models of a col(cid: 173) lection of pixel-based images of digits, and use them for recogni(cid: 173) tion. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their log-likelihoods under each model. We use an EM-based algorithm in which the M-step is computationally straightforward principal components analysis (PCA). Incorporating tangent-plane informa(cid: 173) tion [12] about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.

NeurIPS Conference 1993 Conference Paper

Foraging in an Uncertain Environment Using Predictive Hebbian Learning

  • P. Montague
  • Peter Dayan
  • Terrence Sejnowski

Survival is enhanced by an ability to predict the availability of food, the likelihood of predators, and the presence of mates. We present a concrete model that uses diffuse neurotransmitter systems to implement a predictive version of a Hebb learning rule embedded in a neural ar(cid: 173) chitecture based on anatomical and physiological studies on bees. The model captured the strategies seen in the behavior of bees and a number of other animals when foraging in an uncertain environment. The predictive model suggests a unified way in which neuromodulatory influences can be used to bias actions and control synaptic plasticity. Successful predictions enhance adaptive behavior by allowing organisms to prepare for fu(cid: 173) ture actions, rewards, or punishments. Moreover, it is possible to improve upon behavioral choices if the consequences of executing different actions can be reliably predicted. Al(cid: 173) though classical and instrumental conditioning results from the psychological literature [1] demonstrate that the vertebrate brain is capable of reliable prediction, how these predictions are computed in brains is not yet known. The brains of vertebrates and invertebrates possess small nuclei which project axons throughout large expanses of target tissue and deliver various neurotransmitters such as dopamine, norepinephrine, and acetylcholine [4]. The activity in these systems may report on reinforcing stimuli in the world or may reflect an expectation of future reward [5, 6, 7, 8]. *Division of Neuroscience, Baylor College of Medicine, Houston, TX 77030

NeurIPS Conference 1993 Conference Paper

Temporal Difference Learning of Position Evaluation in the Game of Go

  • Nicol Schraudolph
  • Peter Dayan
  • Terrence Sejnowski

The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spa(cid: 173) tiotemporal interactions that make position evaluation extremely difficult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training networks to evaluate Go positions via tem(cid: 173) poral difference (TD) learning. Our approach is based on network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play. These techniques yield far better performance than undifferentiated networks trained by self(cid: 173) play alone. A network with less than 500 weights learned within 3, 000 games of 9x9 Go a position evaluation function that enables a primitive one-ply search to defeat a commercial Go program at a low playing level.

NeurIPS Conference 1992 Conference Paper

Feudal Reinforcement Learning

  • Peter Dayan
  • Geoffrey Hinton

One way to speed up reinforcement learning is to enable learning to happen simultaneously at multiple resolutions in space and time. This paper shows how to create a Q-Iearning managerial hierarchy in which high level managers learn how to set tasks to their sub(cid: 173) managers who, in turn, learn how to satisfy them. Sub-managers need not initially understand their managers' commands. They simply learn to maximise their reinforcement in the context of the current command. We illustrate the system using a simple maze task. . As the system learns how to get around, satisfying commands at the multiple levels, it explores more efficiently than standard, flat, Q-Iearning and builds a more comprehensive map.

NeurIPS Conference 1991 Conference Paper

Perturbing Hebbian Rules

  • Peter Dayan
  • Geoffrey Goodhill

Feedforward networks composed of units which compute a sigmoidal func(cid: 173) tion of a weighted sum of their inputs have been much investigated. We tested the approximation and estimation capabilities of networks using functions more complex than sigmoids. Three classes of functions were tested: polynomials, rational functions, and flexible Fourier series. Un(cid: 173) like sigmoids, these classes can fit non-monotonic functions. They were compared on three problems: prediction of Boston housing prices, the sunspot count, and robot arm inverse dynamics. The complex units at(cid: 173) tained clearly superior performance on the robot arm problem, which is a highly non-monotonic, pure approximation problem. On the noisy and only mildly nonlinear Boston housing and sunspot problems, differences among the complex units were revealed; polynomials did poorly, whereas rationals and flexible Fourier series were comparable to sigmoids.

NeurIPS Conference 1990 Conference Paper

Navigating through Temporal Difference

  • Peter Dayan

Barto, Sutton and Watkins [2] introduced a grid task as a didactic ex(cid: 173) ample of temporal difference planning and asynchronous dynamical pre>(cid: 173) gramming. This paper considers the effects of changing the coding of the input stimulus, and demonstrates that the self-supervised learning of a particular form of hidden unit representation improves performance.