Arrow Research search

Author name cluster

Derek Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
1 author row

Possible papers

4

NeurIPS Conference 2021 Conference Paper

ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

  • Tongzhou Mu
  • Zhan Ling
  • Fanbo Xiang
  • Derek Yang
  • Xuanlin Li
  • Stone Tao
  • Zhiao Huang
  • Zhiwei Jia

Object manipulation from 3D visual inputs poses many challenges on building generalizable perception and policy models. However, 3D assets in existing benchmarks mostly lack the diversity of 3D shapes that align with real-world intra-class complexity in topology and geometry. Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark manipulation skills over diverse objects in a full-physics simulator. 3D assets in ManiSkill include large intra-class topological and geometric variations. Tasks are carefully chosen to cover distinct types of manipulation challenges. Latest progress in 3D vision also makes us believe that we should customize the benchmark so that the challenge is inviting to researchers working on 3D deep learning. To this end, we simulate a moving panoramic camera that returns ego-centric point clouds or RGB-D images. In addition, we would like ManiSkill to serve a broad set of researchers interested in manipulation research. Besides supporting the learning of policies from interactions, we also support learning-from-demonstrations (LfD) methods, by providing a large number of high-quality demonstrations (~36, 000 successful trajectories, ~1. 5M point cloud/RGB-D frames in total). We provide baselines using 3D deep learning and LfD algorithms. All code of our benchmark (simulator, environment, SDK, and baselines) is open-sourced (\href{https: //github. com/haosulab/ManiSkill}{Github repo}), and a challenge facing interdisciplinary researchers will be held based on the benchmark.

NeurIPS Conference 2020 Conference Paper

RD$^2$: Reward Decomposition with Representation Decomposition

  • Zichuan Lin
  • Derek Yang
  • Li Zhao
  • Tao Qin
  • Guangwen Yang
  • Tie-Yan Liu

Reward decomposition, which aims to decompose the full reward into multiple sub-rewards, has been proven beneficial for improving sample efficiency in reinforcement learning. Existing works on discovering reward decomposition are mostly policy dependent, which constrains diverse or disentangled behavior between different policies induced by different sub-rewards. In this work, we propose a set of novel reward decomposition principles by constraining uniqueness and compactness of different state features/representations relevant to different sub-rewards. Our principles encourage sub-rewards with minimal relevant features, while maintaining the uniqueness of each sub-reward. We derive a deep learning algorithm based on our principle, and term our method as RD$^2$, since we learn reward decomposition and representation decomposition jointly. RD$^2$ is evaluated on a toy case, where we have the true reward structure, and some Atari environments where reward structure exists but is unknown to the agent to demonstrate the effectiveness of RD$^2$ against existing reward decomposition methods.

NeurIPS Conference 2019 Conference Paper

Distributional Reward Decomposition for Reinforcement Learning

  • Zichuan Lin
  • Li Zhao
  • Derek Yang
  • Tao Qin
  • Tie-Yan Liu
  • Guangwen Yang

Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge. Consequently, our agent achieves better performance than existing methods on environments with multiple reward channels.

NeurIPS Conference 2019 Conference Paper

Fully Parameterized Quantile Function for Distributional Reinforcement Learning

  • Derek Yang
  • Li Zhao
  • Zichuan Lin
  • Tao Qin
  • Jiang Bian
  • Tie-Yan Liu

Distributional Reinforcement Learning (RL) differs from traditional RL in that, rather than the expectation of total returns, it estimates distributions and has achieved state-of-the-art performance on Atari Games. The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution. Existing distributional RL algorithms parameterize either the probability side or the return value side of the distribution function, leaving the other side uniformly fixed as in C51, QR-DQN or randomly sampled as in IQN. In this paper, we propose fully parameterized quantile function that parameterizes both the quantile fraction axis (i. e. , the x-axis) and the value axis (i. e. , y-axis) for distributional RL. Our algorithm contains a fraction proposal network that generates a discrete set of quantile fractions and a quantile value network that gives corresponding quantile values. The two networks are jointly trained to find the best approximation of the true distribution. Experiments on 55 Atari Games show that our algorithm significantly outperforms existing distributional RL algorithms and creates a new record for the Atari Learning Environment for non-distributed agents.