Arrow Research search

Author name cluster

Nando de Freitas

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

56 papers
2 author rows

Possible papers

56

ICML Conference 2024 Conference Paper

Genie: Generative Interactive Environments

  • Jake Bruce
  • Michael D. Dennis
  • Ashley Edwards
  • Jack Parker-Holder
  • Yuge Shi
  • Edward Hughes 0001
  • Matthew Lai
  • Aditi Mavalankar

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

TMLR Journal 2022 Journal Article

A Generalist Agent

  • Scott Reed
  • Konrad Zolna
  • Emilio Parisotto
  • Sergio Gómez Colmenarejo
  • Alexander Novikov
  • Gabriel Barth-maron
  • Mai Giménez
  • Yury Sulsky

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.

JMLR Journal 2022 Journal Article

On Instrumental Variable Regression for Deep Offline Policy Evaluation

  • Yutian Chen
  • Liyuan Xu
  • Caglar Gulcehre
  • Tom Le Paine
  • Arthur Gretton
  • Nando de Freitas
  • Arnaud Doucet

We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being correlated. Hence, direct minimization of the Bellman error can result in significantly biased Q-function estimates. We explain why fixing the target Q-network in Deep Q-Networks and Fitted Q Evaluation provides a way of overcoming this confounding, thus shedding new light on this popular but not well understood trick in the deep RL literature. An alternative approach to address confounding is to leverage techniques developed in the causality literature, notably instrumental variables (IV). We bring together here the literature on IV and RL by investigating whether IV approaches can lead to improved Q-function estimates. This paper analyzes and compares a wide range of recent IV methods in the context of offline policy evaluation (OPE), where the goal is to estimate the value of a policy using logged data only. By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques. We find empirically that state-of-the-art OPE methods are closely matched in performance by some IV methods such as AGMM, which were not developed for OPE. We open-source all our code and datasets at https://github.com/liyuan9988/IVOPEwithACME. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

NeurIPS Conference 2022 Conference Paper

Towards Learning Universal Hyperparameter Optimizers with Transformers

  • Yutian Chen
  • Xingyou Song
  • Chansoo Lee
  • Zi Wang
  • Richard Zhang
  • David Dohan
  • Kazuya Kawakami
  • Greg Kochanski

Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction when trained on vast tuning data from the wild, such as Google’s Vizier database, one of the world’s largest HPO datasets. Our extensive experiments demonstrate that the OptFormer can simultaneously imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates. Compared to a Gaussian Process, the OptFormer also learns a robust prior distribution for hyperparameter response functions, and can thereby provide more accurate and better calibrated predictions. This work paves the path to future extensions for training a Transformer-based model as a general HPO optimizer.

NeurIPS Conference 2021 Conference Paper

Active Offline Policy Selection

  • Ksenia Konyushova
  • Yutian Chen
  • Thomas Paine
  • Caglar Gulcehre
  • Cosmin Paduraru
  • Daniel J. Mankowitz
  • Misha Denil
  • Nando de Freitas

This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation in the real environment. Yet, large amounts of online interactions are often not possible in practice. To overcome this problem, we introduce active offline policy selection --- a novel sequential decision approach that combines logged data with online interaction to identify the best policy. This approach uses OPE estimates to warm start the online evaluation. Then, in order to utilize the limited environment interactions wisely we decide which policy to evaluate next based on a Bayesian optimization method with a kernel function that represents policy similarity. We use multiple benchmarks with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation.

ICLR Conference 2021 Conference Paper

Learning Deep Features in Instrumental Variable Regression

  • Liyuan Xu
  • Yutian Chen 0001
  • Siddarth Srinivasan
  • Nando de Freitas
  • Arnaud Doucet
  • Arthur Gretton

Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables from observational data by using an instrumental variable, which affects the outcome only through the treatment. In classical IV regression, learning proceeds in two stages: stage 1 performs linear regression from the instrument to the treatment; and stage 2 performs linear regression from the treatment to the outcome, conditioned on the instrument. We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear. In this case, deep neural nets are trained to define informative nonlinear features on the instruments and treatments. We propose an alternating training regime for these features to ensure good end-to-end performance when composing stages 1 and 2, thus obtaining highly flexible feature maps in a computationally efficient manner. DFIV outperforms recent state-of-the-art methods on challenging IV benchmarks, including settings involving high dimensional image data. DFIV also exhibits competitive performance in off-policy policy evaluation for reinforcement learning, which can be understood as an IV regression task.

NeurIPS Conference 2020 Conference Paper

Critic Regularized Regression

  • Ziyu Wang
  • Alexander Novikov
  • Konrad Zolna
  • Josh S. Merel
  • Jost Tobias Springenberg
  • Scott E. Reed
  • Bobak Shahriari
  • Noah Siegel

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.

ICLR Conference 2020 Conference Paper

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

  • Çaglar Gülçehre
  • Tom Le Paine
  • Bobak Shahriari
  • Misha Denil
  • Matthew Hoffman 0002
  • Hubert Soyer
  • Richard Tanburn
  • Steven Kapturowski

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fail to see even a single successful trajectory after tens of billions of steps of exploration.

NeurIPS Conference 2020 Conference Paper

Modular Meta-Learning with Shrinkage

  • Yutian Chen
  • Abram L. Friesen
  • Feryal Behbahani
  • Arnaud Doucet
  • David Budden
  • Matthew Hoffman
  • Nando de Freitas

Many real-world problems, including multi-speaker text-to-speech synthesis, can greatly benefit from the ability to meta-learn large models with only a few task- specific components. Updating only these task-specific modules then allows the model to be adapted to low-data tasks for as many steps as necessary without risking overfitting. Unfortunately, existing meta-learning methods either do not scale to long adaptation or else rely on handcrafted task-specific architectures. Here, we propose a meta-learning approach that obviates the need for this often sub-optimal hand-selection. In particular, we develop general techniques based on Bayesian shrinkage to automatically discover and learn both task-specific and general reusable modules. Empirically, we demonstrate that our method discovers a small set of meaningful task-specific modules and outperforms existing meta- learning approaches in domains like few-shot text-to-speech that have little task data and long adaptation horizons. We also show that existing meta-learning methods including MAML, iMAML, and Reptile emerge as special cases of our method.

NeurIPS Conference 2020 Conference Paper

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

  • Caglar Gulcehre
  • Ziyu Wang
  • Alexander Novikov
  • Thomas Paine
  • Sergio Gómez
  • Konrad Zolna
  • Rishabh Agarwal
  • Josh S. Merel

Offline methods for reinforcement learning have a potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. RL Unplugged includes data from a diverse range of domains including games e. g. , Atari benchmark) and simulated motor control problems (e. g. , DM Control Suite). The datasets include domains that are partially or fully observable, use continuous or discrete actions, and have stochastic vs. deterministic dynamics. We propose detailed evaluation protocols for each domain in RL Unplugged and provide an extensive analysis of supervised learning and offline RL methods using these protocols. We will release data for all our tasks and open-source all algorithms presented in this paper. We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community. Moving forward, we view RL Unplugged as a living benchmark suite that will evolve and grow with datasets contributed by the research community and ourselves. Our project page is available on github.

NeurIPS Conference 2019 Conference Paper

Learning Compositional Neural Programs with Recursive Tree Search and Planning

  • Thomas PIERROT
  • Guillaume Ligner
  • Scott Reed
  • Olivier Sigaud
  • Nicolas Perrin
  • Alexandre Laterre
  • David Kas
  • Karim Beguir

We propose a novel reinforcement learning algorithm, AlphaNPI, that incorpo- rates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and in- crease interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. This specification enables us to overcome the need for strong supervision in the form of execution traces and consequently train NPI models effectively with reinforcement learning. The experiments show that AlphaNPI can sort as well as previous strongly supervised NPI variants. The AlphaNPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disks. The experiments also show that when deploying our neural network policies, it is advantageous to do planning with guided Monte Carlo tree search.

ICML Conference 2019 Conference Paper

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

  • Natasha Jaques
  • Angeliki Lazaridou
  • Edward Hughes 0001
  • Çaglar Gülçehre
  • Pedro A. Ortega
  • DJ Strouse
  • Joel Z. Leibo
  • Nando de Freitas

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents’ actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agents. Actions that lead to bigger changes in other agents’ behavior are considered influential and are rewarded. We show that this is equivalent to rewarding agents for having high mutual information between their actions. Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. The influence rewards for all agents can be computed in a decentralized way by enabling agents to learn a model of other agents using deep neural networks. In contrast, key previous works on emergent communication in the MARL setting were unable to learn diverse policies in a decentralized manner and had to resort to centralized training. Consequently, the influence reward opens up a window of new opportunities for research in this area.

NeurIPS Conference 2018 Conference Paper

Playing hard exploration games by watching YouTube

  • Yusuf Aytar
  • Tobias Pfaff
  • David Budden
  • Thomas Paine
  • Ziyu Wang
  • Nando de Freitas

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i. e. with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a method that overcomes these limitations in two stages. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i. e. vision and sound). Second, we embed a single YouTube video in this representation to learn a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma’s Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

NeurIPS Conference 2017 Conference Paper

Cortical microcircuits as gated-recurrent neural networks

  • Rui Costa
  • Ioannis Alexandros Assael
  • Brendan Shillingford
  • Nando de Freitas
  • Tim Vogels

Cortical circuits exhibit intricate recurrent architectures that are remarkably similar across different brain areas. Such stereotyped structure suggests the existence of common computational principles. However, such principles have remained largely elusive. Inspired by gated-memory networks, namely long short-term memory networks (LSTMs), we introduce a recurrent neural network in which information is gated through inhibitory cells that are subtractive (subLSTM). We propose a natural mapping of subLSTMs onto known canonical excitatory-inhibitory cortical microcircuits. Our empirical evaluation across sequential image classification and language modelling tasks shows that subLSTM units can achieve similar performance to LSTM units. These results suggest that cortical circuits can be optimised to solve complex contextual problems and proposes a novel view on their computational function. Overall our work provides a step towards unifying recurrent networks as used in machine learning with their biological counterparts.

ICML Conference 2017 Conference Paper

Learned Optimizers that Scale and Generalize

  • Olga Wichrowska
  • Niru Maheswaranathan
  • Matthew Hoffman 0002
  • Sergio Gomez Colmenarejo
  • Misha Denil
  • Nando de Freitas
  • Jascha Sohl-Dickstein

Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a meta-training ensemble of small, diverse, optimization tasks capturing common properties of loss landscapes. The optimizer learns to outperform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural networks, despite seeing no neural networks in its meta-training set. Finally, it generalizes to train Inception V3 and ResNet V2 architectures on the ImageNet dataset for thousands of steps, optimization problems that are of a vastly different scale than those it was trained on.

ICML Conference 2017 Conference Paper

Learning to Learn without Gradient Descent by Gradient Descent

  • Yutian Chen 0001
  • Matthew Hoffman 0002
  • Sergio Gomez Colmenarejo
  • Misha Denil
  • Timothy P. Lillicrap
  • Matthew M. Botvinick
  • Nando de Freitas

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

ICML Conference 2017 Conference Paper

Parallel Multiscale Autoregressive Density Estimation

  • Scott E. Reed
  • Aäron van den Oord
  • Nal Kalchbrenner
  • Sergio Gomez Colmenarejo
  • Ziyu Wang 0001
  • Yutian Chen 0001
  • Dan Belov
  • Nando de Freitas

PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain pixel groups as conditionally independent. Our new PixelCNN model achieves competitive density estimation and orders of magnitude speedup – O(log N) sampling instead of O(N) – enabling the practical generation of 512x512 images. We evaluate the model on class-conditional image generation, text-to-image synthesis, and action-conditional video generation, showing that our model achieves the best results among non-pixel-autoregressive density models that allow efficient sampling.

NeurIPS Conference 2017 Conference Paper

Robust Imitation of Diverse Behaviors

  • Ziyu Wang
  • Josh Merel
  • Scott Reed
  • Nando de Freitas
  • Gregory Wayne
  • Nicolas Heess

Deep generative models have recently shown great promise in imitation learning for motor control. Given enough data, even supervised approaches can do one-shot imitation learning; however, they are vulnerable to cascading failures when the agent trajectory diverges from the demonstrations. Compared to purely supervised methods, Generative Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently mode-seeking and more difficult to train. In this paper, we show how to combine the favourable aspects of these two approaches. The base of our model is a new type of variational autoencoder on demonstration trajectories that learns semantic policy embeddings. We show that these embeddings can be learned on a 9 DoF Jaco robot arm in reaching tasks, and then smoothly interpolated with a resulting smooth interpolation of reaching behavior. Leveraging these policy representations, we develop a new version of GAIL that (1) is much more robust than the purely-supervised controller, especially with few demonstrations, and (2) avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not. We demonstrate our approach on learning diverse gaits from demonstration on a 2D biped and a 62 DoF 3D humanoid in the MuJoCo physics environment.

ICML Conference 2016 Conference Paper

Dueling Network Architectures for Deep Reinforcement Learning

  • Ziyu Wang 0001
  • Tom Schaul
  • Matteo Hessel
  • Hado van Hasselt
  • Marc Lanctot
  • Nando de Freitas

In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we present a new neural network architecture for model-free reinforcement learning. Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. Moreover, the dueling architecture enables our RL agent to outperform the state-of-the-art on the Atari 2600 domain.

JMLR Journal 2016 Journal Article

Herded Gibbs Sampling

  • Yutian Chen
  • Luke Bornn
  • Nando de Freitas
  • Mareija Eskelin
  • Jing Fang
  • Max Welling

The Gibbs sampler is one of the most popular algorithms for inference in statistical models. In this paper, we introduce a herding variant of this algorithm, called herded Gibbs, that is entirely deterministic. We prove that herded Gibbs has an $O(1/T)$ convergence rate for models with independent variables and for fully connected probabilistic graphical models. Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs. However, the convergence for herded Gibbs for sparsely connected probabilistic graphical models is still an open problem. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

NeurIPS Conference 2016 Conference Paper

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

  • Jakob Foerster
  • Ioannis Alexandros Assael
  • Nando de Freitas
  • Shimon Whiteson

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.

NeurIPS Conference 2016 Conference Paper

Learning to learn by gradient descent by gradient descent

  • Marcin Andrychowicz
  • Misha Denil
  • Sergio Gómez
  • Matthew Hoffman
  • David Pfau
  • Tom Schaul
  • Brendan Shillingford
  • Nando de Freitas

The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.

ICLR Conference 2016 Conference Paper

Neural Programmer-Interpreters

  • Scott E. Reed
  • Nando de Freitas

We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs. NPI has three learnable components: a task-agnostic recurrent core, a persistent key-value program memory, and domain-specific encoders that enable a single NPI to operate in multiple perceptually diverse environments with distinct affordances. By learning to compose lower-level programs to express higher-level programs, NPI reduces sample complexity and increases generalization ability compared to sequence-to-sequence LSTMs. The program memory allows efficient learning of additional tasks by building on existing programs. NPI can also harness the environment (e.g. a scratch pad with read-write pointers) to cache intermediate results of computation, lessening the long-term memory burden on recurrent hidden units. In this work we train the NPI with fully-supervised execution traces; each program has example sequences of calls to the immediate subprograms conditioned on the input. Rather than training on a huge number of relatively weak labels, NPI learns from a small number of rich examples. We demonstrate the capability of our model to learn several types of compositional programs: addition, sorting, and canonicalizing 3D models. Furthermore, a single NPI learns to execute these programs and all 21 associated subprograms.

NeurIPS Conference 2014 Conference Paper

Distributed Parameter Estimation in Probabilistic Graphical Models

  • Yariv Mizrahi
  • Misha Denil
  • Nando de Freitas

This paper presents foundational theoretical results on distributed parameter estimation for undirected probabilistic graphical models. It introduces a general condition on composite likelihood decompositions of these models which guarantees the global consistency of distributed estimators, provided the local estimators are consistent.

ICML Conference 2014 Conference Paper

Linear and Parallel Learning of Markov Random Fields

  • Yariv Dror Mizrahi
  • Misha Denil
  • Nando de Freitas

We introduce a new embarrassingly parallel parameter learning algorithm for Markov random fields which is efficient for a large class of practical models. Our algorithm parallelizes naturally over cliques and, for graphs of bounded degree, its complexity is linear in the number of cliques. Unlike its competitors, our algorithm is fully parallel and for log-linear models it is also data efficient, requiring only the local sufficient statistics of the data to estimate parameters.

ICML Conference 2014 Conference Paper

Narrowing the Gap: Random Forests In Theory and In Practice

  • Misha Denil
  • David Matheson
  • Nando de Freitas

Despite widespread interest and practical use, the theoretical properties of random forests are still not well understood. In this paper we contribute to this understanding in two ways. We present a new theoreti- cally tractable variant of random regression forests and prove that our algorithm is con- sistent. We also provide an empirical eval- uation, comparing our algorithm and other theoretically tractable random forest models to the random forest algorithm used in prac- tice. Our experiments provide insight into the relative importance of different simplifi- cations that theoreticians have made to ob- tain tractable models for analysis.

ICML Conference 2013 Conference Paper

Adaptive Hamiltonian and Riemann Manifold Monte Carlo

  • Ziyu Wang 0001
  • Shakir Mohamed
  • Nando de Freitas

In this paper we address the widely-experienced difficulty in tuning Hamiltonian-based Monte Carlo samplers. We develop an algorithm that allows for the adaptation of Hamiltonian and Riemann manifold Hamiltonian Monte Carlo samplers using Bayesian optimization that allows for infinite adaptation of the parameters of these samplers. We show that the resulting sampling algorithms are ergodic, and demonstrate on several models and data sets that the use of our adaptive algorithms makes it is easy to obtain more efficient samplers, in some precluding the need for more complex models. Hamiltonian-based Monte Carlo samplers are widely known to be an excellent choice of MCMC method, and we aim with this paper to remove a key obstacle towards the more widespread use of these samplers in practice.

IJCAI Conference 2013 Conference Paper

Bayesian Optimization in High Dimensions via Random Embeddings

  • Ziyu Wang
  • Masrour Zoghi
  • Frank Hutter
  • David Matheson
  • Nando de Freitas

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple and applies to domains with both categorical and continuous variables. The experiments demonstrate that REMBO can effectively solve high-dimensional problems, including automatic parameter configuration of a popular mixed integer linear programming solver.

ICML Conference 2013 Conference Paper

Consistency of Online Random Forests

  • Misha Denil
  • David Matheson
  • Nando de Freitas

As a testament to their success, the theory of random forests has long been outpaced by their application in practice. In this paper, we take a step towards narrowing this gap by providing a consistency result for online random forests.

ICLR Conference 2013 Conference Paper

Herded Gibbs Sampling

  • Luke Bornn
  • Yutian Chen 0001
  • Nando de Freitas
  • Mareija Eskelin
  • Jing Fang
  • Max Welling

The Gibbs sampler is one of the most popular algorithms for inference in statistical models. In this paper, we introduce a herding variant of this algorithm, called herded Gibbs, that is entirely deterministic. We prove that herded Gibbs has an $O(1/T)$ convergence rate for models with independent variables and for fully connected probabilistic graphical models. Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs. However, the convergence for herded Gibbs for sparsely connected probabilistic graphical models is still an open problem.

NeurIPS Conference 2013 Conference Paper

Predicting Parameters in Deep Learning

  • Misha Denil
  • Babak Shakibi
  • Laurent Dinh
  • Marc'Aurelio Ranzato
  • Nando de Freitas

We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy.

AAAI Conference 2012 Conference Paper

Prediction and Fault Detection of Environmental Signals with Uncharacterised Faults

  • Michael Osborne
  • Roman Garnett
  • Kevin Swersky
  • Nando de Freitas

Many signals of interest are corrupted by faults of an unknown type. We propose an approach that uses Gaussian processes and a general “fault bucket” to capture a priori uncharacterised faults, along with an approximate method for marginalising the potential faultiness of all observations. This gives rise to an efficient, flexible algorithm for the detection and automatic correction of faults. Our method is deployed in the domain of water monitoring and management, where it is able to solve several fault detection, correction, and prediction problems. The method works well despite the fact that the data is plagued with numerous difficulties, including missing observations, multiple discontinuities, nonlinearity and many unanticipated types of fault.

UAI Conference 2011 Conference Paper

Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood

  • Benjamin M. Marlin
  • Nando de Freitas

Standard maximum likelihood estimation cannot be applied to discrete energy-based models in the general case because the computation of exact model probabilities is intractable. Recent research has seen the proposal of several new estimators designed specifically to overcome this intractability, but virtually nothing is known about their theoretical properties. In this paper, we present a generalized estimator that unifies many of the classical and recently proposed estimators. We use results from the standard asymptotic theory for M-estimators to derive a generic expression for the asymptotic covariance matrix of our generalized estimator. We apply these results to study the relative statistical efficiency of classical pseudolikelihood and the recently-proposed ratio matching estimator.

UAI Conference 2011 Conference Paper

Portfolio Allocation for Bayesian Optimization

  • Matthew Hoffman 0002
  • Eric Brochu
  • Nando de Freitas

Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive black-box optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the posterior estimate of the objective. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multi-armed bandit strategy. We propose several portfolio strategies, the best of which we call GP-Hedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm's performance.

UAI Conference 2010 Conference Paper

Intracluster Moves for Constrained Discrete-Space MCMC

  • Firas Hamze
  • Nando de Freitas

This paper addresses the problem of sampling from binary distributions with constraints. In particular, it proposes an MCMC method to draw samples from a distribution of the set of all states at a specified distance from some reference state. For example, when the reference state is the vector of zeros, the algorithm can draw samples from a binary distribution with a constraint on the number of active variables, say the number of 1’s. We motivate the need for this algorithm with examples from statistical physics and probabilistic inference. Unlike previous algorithms proposed to sample from binary distributions with these constraints, the new algorithm allows for large moves in state space and tends to propose them such that they are energetically favourable. The algorithm is demonstrated on three Boltzmann machines of varying difficulty: A ferromagnetic Ising model (with positive potentials), a restricted Boltzmann machine with learned Gabor-like filters as potentials, and a challenging three-dimensional spin-glass (with positive and negative potentials).

UAI Conference 2009 Conference Paper

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

  • Matthias Hoffman
  • Hendrik Kück
  • Nando de Freitas
  • Arnaud Doucet

In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.

ICRA Conference 2008 Conference Paper

Target-directed attention: Sequential decision-making for gaze planning

  • Julia Vogel
  • Nando de Freitas

It is widely agreed that efficient visual search requires the integration of target-driven top-down information and image-driven bottom-up information. Yet the problem of gaze planning - that is, selecting the next best gaze location given the current observations - remains largely unsolved. We propose a probabilistic system that models the gaze sequence as a finite-horizon Bayesian sequential decision process. Direct policy search is used to reason about the next best gaze locations. The system integrates bottom-up saliency information, top-down target knowledge and additional context information through principled Bayesian priors. This results in proposal gaze locations that depend not only the featural visual saliency, but also on prior knowledge and the spatial likelihood of locating the target. The system has been implemented using state-of- the-art object detectors and evaluated on a real-world dataset by comparing it to gaze sequences proposed by a pure bottom-up saliency-based process and to an object detection approach that analyzes the full image. The target-directed attention system is shown to result in higher object detection precision than both competitors, to attend to more relevant targets than the bottom-up attention system, and to require significantly less computation time than the exhaustive approach.

ICRA Conference 2007 Conference Paper

Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm: Marginal-SLAM

  • Ruben Martinez-Cantin
  • Nando de Freitas
  • José A. Castellanos 0001

This paper presents a new particle method, with stochastic parameter estimation, to solve the SLAM problem. The underlying algorithm is rooted on a solid probabilistic foundation and is guaranteed to converge asymptotically, unlike many existing popular approaches. Moreover, it is efficient in storage and computation. The new algorithm carries out filtering only in the marginal filtering space, thereby allowing for the recursive computation of low variance estimates of the map. The paper provides mathematical arguments and empirical evidence to substantiate the fact that the new method represents an improvement over the existing particle filtering approaches for SLAM, which work on the joint path state space.

ICML Conference 2006 Conference Paper

Fast particle smoothing: if I had a million particles

  • Mike Klaas
  • Mark Briers
  • Nando de Freitas
  • Arnaud Doucet
  • Simon Maskell
  • Dustin Lang

We propose efficient particle smoothing methods for generalized state-spaces models. Particle smoothing is an expensive O(N 2 ) algorithm, where N is the number of particles. We overcome this problem by integrating dual tree recursions and fast multipole techniques with forward-backward smoothers, a new generalized two-filter smoother and a maximum a posteriori (MAP) smoother. Our experiments show that these improvements can substantially increase the practicality of particle smoothing.

ICRA Conference 2005 Conference Paper

Fast Computational Methods for Visually Guided Robots

  • Maryam Mahdaviani
  • Nando de Freitas
  • Bob Fraser
  • Firas Hamze

This paper proposes numerical algorithms for reducing the computational cost of semi-supervised and active learning procedures for visually guided mobile robots from O(M 3 to O(M), while reducing the storage requirements from M 2 to M. This reduction in cost is essential for real-time interaction with mobile robots. The considerable speed ups are achieved using Krylov subspace methods and the fast Gauss transform. Although these state-of-the-art numerical algorithms are known, their application to semi-supervised learning, active learning and mobile robotics is new and should be of interest and great value to the robotics community. We apply our fast algorithms to interactive object recognition on Sony’s ERS-7 Aibo. We provide comparisons that clearly demonstrate remarkable improvements in computational speed.

NeurIPS Conference 2005 Conference Paper

Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

  • Firas Hamze
  • Nando de Freitas

This paper presents a new sampling algorithm for approximating func- tions of variables representable as undirected graphical models of arbi- trary connectivity with pairwise potentials, as well as for estimating the notoriously dif(cid: 2)cult partition function of the graph. The algorithm (cid: 2)ts into the framework of sequential Monte Carlo methods rather than the more widely used MCMC, and relies on constructing a sequence of in- termediate distributions which get closer to the desired one. While the idea of using (cid: 147)tempered(cid: 148) proposals is known, we construct a novel se- quence of target distributions where, rather than dropping a global tem- perature parameter, we sequentially couple individual pairs of variables that are, initially, sampled exactly from a spanning tree of the variables. We present experimental results on inference and estimation of the parti- tion function for sparse and densely-connected graphs.

UAI Conference 2005 Conference Paper

Learning about Individuals from Group Statistics

  • Nando de Freitas
  • Hendrik Kück

We propose a new problem formulation which is similar to, but more informative than, the binary multiple-instance learning problem. In this setting, we are given groups of instances (described by feature vectors) along with estimates of the fraction of positively-labeled instances per group. The task is to learn an instance level classifier from this information. That is, we are trying to estimate the unknown binary labels of individuals from knowledge of group statistics. We propose a principled probabilistic model to solve this problem that accounts for uncertainty in the parameters and in the unknown individual labels. This model is trained with an efficient MCMC algorithm. Its performance is demonstrated on both synthetic and real-world data arising in general object recognition.

UAI Conference 2005 Conference Paper

Nonparametric Bayesian Logic

  • Peter Carbonetto
  • Jacek Kisynski
  • Nando de Freitas
  • David Poole 0001

The Bayesian Logic (BLOG) language was recently developed for defining first-order probability models over worlds with unknown numbers of objects. It handles important problems in AI, including data association and population estimation. This paper extends BLOG by adopting generative processes over function spaces - known as nonparametrics in the Bayesian literature. We introduce syntax for reasoning about arbitrary collections of objects, and their properties, in an intuitive manner. By exploiting exchangeability, distributions over unknown objects and their attributes are cast as Dirichlet processes, which resolve difficulties in model selection and inference caused by varying numbers of objects. We demonstrate these concepts with application to citation matching.

UAI Conference 2005 Conference Paper

Toward Practical N2 Monte Carlo: the Marginal Particle Filter

  • Mike Klaas
  • Nando de Freitas
  • Arnaud Doucet

Sequential Monte Carlo techniques are useful for state estimation in non-linear, non-Gaussian dynamic models. These methods allow us to approximate the joint posterior distribution using sequential importance sampling. In this framework, the dimension of the target distribution grows with each time step, thus it is necessary to introduce some resampling steps to ensure that the estimates provided by the algorithm have a reasonable variance. In many applications, we are only interested in the marginal filtering distribution which is defined on a space of fixed dimension. We present a Sequential Monte Carlo algorithm called the Marginal Particle Filter which operates directly on the marginal distribution, hence avoiding having to perform importance sampling on a space of growing dimension. Using this idea, we also derive an improved version of the auxiliary particle filter. We show theoretic and empirical results which demonstrate a reduction in variance over conventional particle filtering, and present techniques for reducing the cost of the marginal particle filter with N particles from O(N2) to O(N logN).

UAI Conference 2004 Conference Paper

From Fields to Trees

  • Firas Hamze
  • Nando de Freitas

We present new MCMC algorithms for computing the posterior distributions and expectations of the unknown variables in undirected graphical models with regular structure. For demonstration purposes, we focus on Markov Random Fields (MRFs). By partitioning the MRFs into non-overlapping trees, it is possible to compute the posterior distribution of a particular tree exactly by conditioning on the remaining tree. These exact solutions allow us to construct efficient blocked and Rao-Blackwellised MCMC algorithms. We show empirically that tree sampling is considerably more efficient than other partitioned sampling schemes and the naive Gibbs sampler, even in cases where loopy belief propagation fails to converge. We prove that tree sampling exhibits lower variance than the naive Gibbs sampler and other naive partitioning schemes using the theoretical measure of maximal correlation. We also construct new information theory tools for comparing different MCMC schemes and show that, under these, tree sampling is more efficient.

JMLR Journal 2003 Journal Article

Matching Words and Pictures

  • Kobus Barnard
  • Pinar Duygulu
  • David Forsyth
  • Nando de Freitas
  • David M. Blei
  • Michael I. Jordan

We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation) and corresponding to particular image regions (region naming). Auto-annotation might help organize and access large collections of images. Region naming is a model of object recognition as a process of translating image regions to words, much as one might translate from one language to another. Learning the relationships between image regions and semantic correlates (words) is an interesting example of multi-modal data mining, particularly because it is typically hard to apply data mining techniques to collections of images. We develop a number of models for the joint distribution of image regions and words, including several which explicitly learn the correspondence between regions and words. We study multi-modal and correspondence extensions to Hofmann's hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multi-modal extension to mixture of latent Dirichlet allocation (MoM-LDA). All models are assessed using a large collection of annotated images of real scenes. We study in depth the difficult problem of measuring performance. For the annotation task, we look at prediction performance on held out data. We present three alternative measures, oriented toward different types of task. Measuring the performance of correspondence methods is harder, because one must determine whether a word has been placed on the right region of an image. We can use annotation performance as a proxy measure, but accurate measurement requires hand labeled data, and thus must occur on a smaller scale. We show results using both an annotation proxy, and manually labeled data.

NeurIPS Conference 2002 Conference Paper

"Name That Song!" A Probabilistic Approach to Querying on Music and Text

  • Brochu Eric
  • Nando de Freitas

We present a novel, flexible statistical approach for modelling music and text jointly. The approach is based on multi-modal mixture models and maximum a posteriori estimation using EM. The learned models can be used to browse databases with documents containing music and text, to search for music using queries consisting of music and text (lyrics and other contextual information), to annotate text documents with music, and to automatically recommend or identify similar songs.

NeurIPS Conference 2002 Conference Paper

Real-Time Monitoring of Complex Industrial Processes with Particle Filters

  • Rubén Morales-Menéndez
  • Nando de Freitas
  • David Poole

This paper discusses the application of particle filtering algorithms to fault diagnosis in complex industrial processes. We consider two ubiq- uitous processes: an industrial dryer and a level tank. For these appli- cations, we compared three particle filtering variants: standard parti- cle filtering, Rao-Blackwellised particle filtering and a version of Rao- Blackwellised particle filtering that does one-step look-ahead to select good sampling regions. We show that the overhead of the extra process- ing per particle of the more sophisticated methods is more than compen- sated by the decrease in error and variance.

UAI Conference 2001 Conference Paper

Variational MCMC

  • Nando de Freitas
  • Pedro A. d. F. R. Højen-Sørensen
  • Stuart Russell 0001

We propose a new class of learning algorithms that combines variational approximation and Markov chain Monte Carlo (MCMC) simulation. Naive algorithms that use the variational approximation as proposal distribution can perform poorly because this approximation tends to underestimate the true variance and other features of the data. We solve this problem by introducing more sophisticated MCMC algorithms. One of these algorithms is a mixture of two MCMC kernels: a random walk Metropolis kernel and a blockMetropolis-Hastings (MH) kernel with a variational approximation as proposaldistribution. The MH kernel allows one to locate regions of high probability efficiently. The Metropolis kernel allows us to explore the vicinity of these regions. This algorithm outperforms variationalapproximations because it yields slightly better estimates of the mean and considerably better estimates of higher moments, such as covariances. It also outperforms standard MCMC algorithms because it locates theregions of high probability quickly, thus speeding up convergence. We demonstrate this algorithm on the problem of Bayesian parameter estimation for logistic (sigmoid) belief networks.

UAI Conference 2000 Conference Paper

Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

  • Arnaud Doucet
  • Nando de Freitas
  • Kevin Murphy 0002
  • Stuart Russell 0001

Particle filters (PFs) are powerful sampling-based inference/learning algorithms for dynamic Bayesian networks (DBNs). They allow us to treat, in a principled way, any type of probability distribution, nonlinearity and non-stationarity. They have appeared in several fields under such names as ``condensation'', ``sequential Monte Carlo'' and ``survival of the fittest''. In this paper, we show how we can exploit the structure of the DBN to increase the efficiency of particle filtering, using a technique known as Rao-Blackwellisation. Essentially, this samples some of the variables, and marginalizes out the rest exactly, using the Kalman filter, HMM filter, junction tree algorithm, or any other finite dimensional optimal filter. We show that Rao-Blackwellised particle filters (RBPFs) lead to more accurate estimates than standard PFs. We demonstrate RBPFs on two problems, namely non-stationary online regression with radial basis function networks and robot localization and map building. We also discuss other potential application areas and provide references to some finite dimensional optimal filters.

UAI Conference 2000 Conference Paper

Reversible Jump MCMC Simulated Annealing for Neural Networks

  • Christophe Andrieu
  • Nando de Freitas
  • Arnaud Doucet

We propose a novel reversible jump Markov chain Monte Carlo (MCMC) simulated annealing algorithm to optimize radial basis function (RBF) networks. This algorithm enables us to maximize the joint posterior distribution of the network parameters and the number of basis functions. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We also show that by calibrating a Bayesian model, we can obtain the classical AIC, BIC and MDL model selection criteria within a penalized likelihood framework. Finally, we show theoretically and empirically that the algorithm converges to the modes of the full posterior distribution in an efficient way.

NeurIPS Conference 2000 Conference Paper

The Unscented Particle Filter

  • Rudolph van der Merwe
  • Arnaud Doucet
  • Nando de Freitas
  • Eric Wan

In this paper, we propose a new particle filter based on sequential importance sampling. The algorithm uses a bank of unscented fil(cid: 173) ters to obtain the importance proposal distribution. This proposal has two very "nice" properties. Firstly, it makes efficient use of the latest available information and, secondly, it can have heavy tails. As a result, we find that the algorithm outperforms stan(cid: 173) dard particle filtering and other nonlinear filtering methods very substantially. This experimental finding is in agreement with the theoretical convergence proof for the algorithm. The algorithm also includes resampling and (possibly) Markov chain Monte Carlo (MCMC) steps.