Author name cluster

Michael Laskin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

ICLR Conference 2023 Conference Paper

In-context Reinforcement Learning with Algorithm Distillation

Michael Laskin
Luyu Wang
Junhyuk Oh
Emilio Parisotto
Stephen Spencer
Richie Steigerwald
DJ Strouse
Steven Stenberg Hansen

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.

Details

ICLR Conference 2022 Conference Paper

Hierarchical Few-Shot Imitation with Skill Transition Models

Kourosh Hakhamaneshi
Ruihan Zhao 0001
Albert Zhan
Pieter Abbeel
Michael Laskin

A desirable property of autonomous agents is the ability to both solve long-horizon problems and generalize to unseen tasks. Recent advances in data-driven skill learning have shown that extracting behavioral priors from offline data can enable agents to solve challenging long-horizon tasks with reinforcement learning. However, generalization to tasks unseen during behavioral prior training remains an outstanding challenge. To this end, we present Few-shot Imitation with Skill Transition Models (FIST), an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks given a few downstream demonstrations. FIST learns an inverse skill dynamics model, a distance function, and utilizes a semi-parametric approach for imitation. We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments requiring traversing unseen parts of a large maze and 7-DoF robotic arm experiments requiring manipulating previously unseen objects in a kitchen.

Details

IROS Conference 2022 Conference Paper

Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation

Albert Zhan
Ruihan Zhao 0001
Lerrel Pinto
Pieter Abbeel
Michael Laskin

Recent advances in unsupervised representation learning significantly improved the sample efficiency of training Reinforcement Learning policies in simulated environments. However, similar gains have not yet been seen for real-robot reinforcement learning. In this work, we focus on enabling data-efficient real-robot learning from pixels. We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards. While contrastive pre-training, data augmentation, demonstrations, and reinforcement learning are alone insufficient for efficient learning, our main contribution is showing that the combination of these disparate techniques results in a simple yet data-efficient method. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels, such as reaching, picking, moving, pulling a large object, flipping a switch, and opening a drawer in just 30 minutes of mean real-world training time. We include videos and code on the project website: https://sites.google.com/view/efficient-robotic-manipulation/home.

Details

NeurIPS Conference 2022 Conference Paper

Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

Michael Laskin
Hao Liu
Xue Bin Peng
Denis Yarats
Aravind Rajeswaran
Pieter Abbeel

We introduce Contrastive Intrinsic Control (CIC), an unsupervised reinforcement learning (RL) algorithm that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills vectors to learn behaviour embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioural diversity. We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.

PDF Details

ICML Conference 2021 Conference Paper

Decoupling Representation Learning from Reinforcement Learning

Adam Stooke
Kimin Lee
Pieter Abbeel
Michael Laskin

In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL in most environments. Additionally, we benchmark several leading UL algorithms by pre-training encoders on expert demonstrations and using them, with weights frozen, in RL agents; we find that agents using ATC-trained encoders outperform all others. We also train multi-task encoders on data from multiple environments and show generalization to different downstream RL tasks. Finally, we ablate components of ATC, and introduce a new data augmentation to enable replay of (compressed) latent images from pre-trained encoders when RL requires augmentation. Our experiments span visually diverse RL benchmarks in DeepMind Control, DeepMind Lab, and Atari, and our complete code is available at \url{https: //github. com/astooke/rlpyt/tree/master/rlpyt/ul}.

Details

ICML Conference 2021 Conference Paper

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Kimin Lee
Michael Laskin
Aravind Srinivas
Pieter Abbeel

Off-policy deep reinforcement learning (RL) has been successful in a range of challenging domains. However, standard off-policy RL algorithms can suffer from several issues, such as instability in Q-learning and balancing exploration and exploitation. To mitigate these issues, we present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy RL algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration. By enforcing the diversity between agents using Bootstrap with random initialization, we show that these different ideas are largely orthogonal and can be fruitfully integrated, together further improving the performance of existing off-policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments.

Details

ICML Conference 2020 Conference Paper

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Michael Laskin
Aravind Srinivas
Pieter Abbeel

We present CURL: Contrastive Unsupervised Representations for Reinforcement Learning. CURL extracts high-level features from raw pixels using contrastive learning and performs off-policy control on top of the extracted features. CURL outperforms prior pixel-based methods, both model-based and model-free, on complex tasks in the DeepMind Control Suite and Atari Games showing 1. 9x and 1. 2x performance gains at the 100K environment and interaction steps benchmarks respectively. On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features. Our code is open-sourced and available at https: //www. github. com/MishaLaskin/curl.

Details

RLDM Conference 2019 Conference Abstract

Auxiliary Goal Generation in Deep Reinforcement Learning

Michael Laskin

Hindsight Experience Replay (HER) lets reinforcement learning agents solve sparse reward prob- lems by rewarding the achievement of auxiliary goals. In prior work, HER goals were generated by sampling uniformly from states visited at a future timestep inside an episode. However, this approach rewards agents for achieving random goals regardless of their utility. We present a method, called Auxiliary Goal Genera- tion (AuxGen), that automatically learns the optimal goals to replay. Auxiliary goals are generated with a network that maximizes the action-value function in the Bellman equation, allowing the agent to learn more efficiently. We show that this method leads to substantial improvements over HER in sparse reward settings. We also show that AuxGen can learn auxiliary goals in any off-policy setting. As such, it can be a useful tool for training unsupervised autonomous agents.

PDF Details