Author name cluster

Yan Duan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

ICML Conference 2020 Conference Paper

Variable Skipping for Autoregressive Range Density Estimation

Eric Liang
Zongheng Yang
Ion Stoica
Pieter Abbeel
Yan Duan
Xi Chen 0022

Deep autoregressive models compute point likelihood estimates of individual data points. However, many applications (i. e. , database cardinality estimation), require estimating range densities, a capability that is under-explored by current neural density estimation literature. In these applications, fast and accurate range density estimates over high-dimensional data directly impact user-perceived performance. In this paper, we explore a technique for accelerating range density estimation over deep autoregressive models. This technique, called variable skipping, exploits the sparse structure of range density queries to avoid sampling unnecessary variables during approximate inference. We show that variable skipping provides 10-100x efficiency improvements when targeting challenging high-quantile error metrics, enables complex applications such as text pattern matching, and can be realized via a simple data augmentation procedure without changing the usual maximum likelihood objective.

Details

NeurIPS Conference 2019 Conference Paper

Evaluating Protein Transfer Learning with TAPE

Roshan Rao
Nicholas Bhattacharya
Neil Thomas
Yan Duan
Peter Chen
John Canny
Pieter Abbeel
Yun Song

Protein modeling is an increasingly popular area of machine learning research. Semi-supervised learning has emerged as an important paradigm in protein modeling due to the high cost of acquiring supervised protein labels, but the current literature is fragmented when it comes to datasets and standardized evaluation techniques. To facilitate progress in this field, we introduce the Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. We curate tasks into specific training, validation, and test splits to ensure that each task tests biologically relevant generalization that transfers to real-life scenarios. We benchmark a range of approaches to semi-supervised protein representation learning, which span recent work as well as canonical sequence learning techniques. We find that self-supervised pretraining is helpful for almost all models on all tasks, more than doubling performance in some cases. Despite this increase, in several cases features learned by self-supervised pretraining still lag behind features extracted by state-of-the-art non-neural techniques. This gap in performance suggests a huge opportunity for innovative architecture design and improved modeling paradigms that better capture the signal in biological sequences. TAPE will help the machine learning community focus effort on scientifically relevant problems. Toward this end, all data and code used to run these experiments is available at https: //github. com/songlab-cal/tape

PDF Details

ICML Conference 2019 Conference Paper

Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design

Jonathan Ho
Xi Chen 0022
Aravind Srinivas
Yan Duan
Pieter Abbeel

Flow-based generative models are powerful exact likelihood models with efficient sampling and inference. Despite their computational efficiency, flow-based models generally have much worse density modeling performance compared to state-of-the-art autoregressive models. In this paper, we investigate and improve upon three limiting design choices employed by flow-based models in prior work: the use of uniform noise for dequantization, the use of inexpressive affine flows, and the use of purely convolutional conditioning networks in coupling layers. Based on our findings, we propose Flow++, a new flow-based model that is now the state-of-the-art non-autoregressive model for unconditional density estimation on standard image benchmarks. Our work has begun to close the significant performance gap that has so far existed between autoregressive models and flow-based models.

Details

NeurIPS Conference 2018 Conference Paper

The Importance of Sampling inMeta-Reinforcement Learning

Bradly Stadie
Ge Yang
Rein Houthooft
Peter Chen
Yan Duan
Yuhuai Wu
Pieter Abbeel
Ilya Sutskever

We interpret meta-reinforcement learning as the problem of learning how to quickly find a good sampling distribution in a new environment. This interpretation leads to the development of two new meta-reinforcement learning algorithms: E-MAML and E-$\text{RL}^2$. Results are presented on a new environment we call `Krazy World': a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning. Further results are presented on a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance than baseline algorithms on both tasks.

PDF Details

ICLR Conference 2018 Conference Paper

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Cathy Wu 0002
Aravind Rajeswaran
Yan Duan
Vikash Kumar
Alexandre M. Bayen
Sham M. Kakade
Igor Mordatch
Pieter Abbeel

Policy gradient methods have enjoyed great success in deep reinforcement learning but suffer from high variance of gradient estimates. The high variance problem is particularly exasperated in problems with long horizons or high-dimensional action spaces. To mitigate this issue, we derive a bias-free action-dependent baseline for variance reduction which fully exploits the structural form of the stochastic policy itself and does not make any additional assumptions about the MDP. We demonstrate and quantify the benefit of the action-dependent baseline through both theoretical analysis as well as numerical results, including an analysis of the suboptimality of the optimal state-dependent baseline. The result is a computationally efficient policy gradient algorithm, which scales to high-dimensional control problems, as demonstrated by a synthetic 2000-dimensional target matching task. Our experimental results indicate that action-dependent baselines allow for faster learning on standard reinforcement learning benchmarks and high-dimensional hand manipulation and synthetic tasks. Finally, we show that the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks.

Details

NeurIPS Conference 2017 Conference Paper

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Haoran Tang
Rein Houthooft
Davis Foote
Adam Stooke
OpenAI Xi Chen
Yan Duan
John Schulman
Filip DeTurck

Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.

PDF Details

NeurIPS Conference 2017 Conference Paper

One-Shot Imitation Learning

Yan Duan
Marcin Andrychowicz
Bradly Stadie
OpenAI Jonathan Ho
Jonas Schneider
Ilya Sutskever
Pieter Abbeel
Wojciech Zaremba

Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering. In this paper, we propose a meta-learning framework for achieving such capability, which we call one-shot imitation learning. Specifically, we consider the setting where there is a very large (maybe infinite) set of tasks, and each task has many instantiations. For example, a task could be to stack all blocks on a table into a single tower, another task could be to place all blocks on a table into two-block towers, etc. In each case, different instances of the task would consist of different sets of blocks with different initial states. At training time, our algorithm is presented with pairs of demonstrations for a subset of all tasks. A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration. At test time, a demonstration of a single instance of a new task is presented, and the neural net is expected to perform well on new instances of this new task. Our experiments show that the use of soft attention allows the model to generalize to conditions and tasks unseen in the training data. We anticipate that by training this model on a much greater variety of tasks and settings, we will obtain a general system that can turn any demonstrations into robust policies that can accomplish an overwhelming variety of tasks.

PDF Details

ICML Conference 2016 Conference Paper

Benchmarking Deep Reinforcement Learning for Continuous Control

Yan Duan
Xi Chen 0022
Rein Houthooft
John Schulman
Pieter Abbeel

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https: //github. com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.

Details

ICRA Conference 2016 Conference Paper

Deep spatial autoencoders for visuomotor learning

Chelsea Finn
Xin Yu Tan
Yan Duan
Trevor Darrell
Sergey Levine
Pieter Abbeel

Reinforcement learning provides a powerful and flexible framework for automated acquisition of robotic motion skills. However, applying reinforcement learning requires a sufficiently detailed representation of the state, including the configuration of task-relevant objects. We present an approach that automates state-space construction by learning a state representation directly from camera images. Our method uses a deep spatial autoencoder to acquire a set of feature points that describe the environment for the current task, such as the positions of objects, and then learns a motion skill with these feature points using an efficient reinforcement learning method based on local linear models. The resulting controller reacts continuously to the learned feature points, allowing the robot to dynamically manipulate objects in the world with closed-loop control. We demonstrate our method with a PR2 robot on tasks that include pushing a free-standing toy block, picking up a bag of rice using a spatula, and hanging a loop of rope on a hook at various positions. In each task, our method automatically learns to track task-relevant objects and manipulate their configuration with the robot's arm.

Details

NeurIPS Conference 2016 Conference Paper

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Xi Chen
Yan Duan
Rein Houthooft
John Schulman
Ilya Sutskever
Pieter Abbeel

This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

PDF Details

NeurIPS Conference 2016 Conference Paper

VIME: Variational Information Maximizing Exploration

Rein Houthooft
Xi Chen
Yan Duan
John Schulman
Filip De Turck
Pieter Abbeel

Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.

PDF Details

ICRA Conference 2014 Conference Paper

Gaussian belief space planning with discontinuities in sensing domains

Sachin Patil
Yan Duan
John Schulman
Ken Goldberg
Pieter Abbeel

Discontinuities in sensing domains are common when planning for many robotic navigation and manipulation tasks. For cameras and 3D sensors, discontinuities may be inherent in sensor field of view or may change over time due to occlusions that are created by moving obstructions and movements of the sensor. The associated gaps in sensor information due to missing measurements pose a challenge for belief space and related optimization-based planning methods since there is no gradient information when the system state is outside the sensing domain. We address this in a belief space context by considering the signed distance to the sensing region. We smooth out sensing discontinuities by assuming that measurements can be obtained outside the sensing region with noise levels depending on a sigmoid function of the signed distance. We sequentially improve the continuous approximation by increasing the sigmoid slope over an outer loop to find plans that cope with sensor discontinuities. We also incorporate the information contained in not obtaining a measurement about the state during execution by appropriately truncating the Gaussian belief state. We present results in simulation for tasks with uncertainty involving navigation of mobile robots and reaching tasks with planar robot arms. Experiments suggest that the approach can be used to cope with discontinuities in sensing domains by effectively re-planning during execution.

Details

ICRA Conference 2014 Conference Paper

Planning locally optimal, curvature-constrained trajectories in 3D using sequential convex optimization

Yan Duan
Sachin Patil
John Schulman
Ken Goldberg
Pieter Abbeel

3D curvature-constrained motion planning finds applications in a wide variety of domains, including motion planning for flexible, bevel-tip medical needles, planning curvature-constrained channels in 3D printed implants for targeted brachytherapy dose delivery or channels for cooling turbine blades, and path planning for unmanned aerial vehicles (UAVs). In this work, we present a motion planning technique using sequential convex optimization for computing locally optimal, curvature-constrained trajectories to desired targets while avoiding obstacles in 3D environments. We report two main contributions in this work: (i) curvature-constrained trajectory optimization in 6D pose (position and orientation) space, and (ii) planning multiple trajectories that are mutually collision-free. We demonstrate the performance of our approach on two clinically motivated applications. Our experiments indicate that our approach can compute high-quality plans for medical needle steering in 1. 6 seconds on a commodity PC, enabling re-planning during execution to correct for perturbations. Our approach can also be used for designing optimized channel layouts within 3D printed implants for intracavitary brachytherapy.

Details

IROS Conference 2013 Conference Paper

Sigma hulls for Gaussian belief space planning for imprecise articulated robots amid obstacles

Alex X. Lee
Yan Duan
Sachin Patil
John Schulman
Zoe McCarthy
Jur van den Berg
Ken Goldberg
Pieter Abbeel

In many home and service applications, an emerging class of articulated robots such as the Raven and Baxter trade off precision in actuation and sensing to reduce costs and to reduce the potential for injury to humans in their workspaces. For planning and control of such robots, planning in belief ssigma hullpace, i. e. , modeling such problems as POMDPs, has shown great promise but existing belief space planning methods have primarily been applied to cases where robots can be approximated as points or spheres. In this paper, we extend the belief space framework to treat articulated robots where the linkage can be decomposed into convex components. To allow planning and collision avoidance in Gaussian belief spaces, we introduce the concept of sigma hulls: convex hulls of robot links transformed according to the sigma standard deviation boundary points generated by the Unscented Kalman filter (UKF). We characterize the signed distances between sigma hulls and obstacles in the workspace to formulate efficient collision avoidance constraints compatible with the Gilbert-Johnson-Keerthi (GKJ) and Expanding Polytope Algorithms (EPA) within an optimization-based planning framework. We report results in simulation for planning motions for a 4-DOF planar robot and a 7-DOF articulated robot with imprecise actuation and inaccurate sensors. These experiments suggest that the sigma hull framework can significantly reduce the probability of collision and is computationally efficient enough to permit iterative re-planning for model predictive control.

Details