Arrow Research search

Author name cluster

Xinlei Pan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

IROS Conference 2023 Conference Paper

Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios

  • Yiren Lu 0001
  • Justin Fu
  • George Tucker
  • Xinlei Pan
  • Eli Bronstein
  • Rebecca Roelofs
  • Benjamin Sapp
  • Brandyn White

Imitation learning (IL) is a simple and powerful way to use high-quality human driving data, which can be collected at scale, to produce human-like behavior. However, policies based on imitation learning alone often fail to sufficiently account for safety and reliability concerns. In this paper, we show how imitation learning combined with reinforcement learning using simple rewards can substan-tially improve the safety and reliability of driving policies over those learned from imitation alone. In particular, we train a policy on over lOOk miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision likelihood. Our analysis shows that while imitation can perform well in low-difficulty scenarios that are well-covered by the demonstration data, our proposed approach significantly improves robustness on the most challenging scenarios (over 38 % reduction in failures). To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real- world human driving data.

NeurIPS Conference 2023 Conference Paper

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

  • Cole Gulino
  • Justin Fu
  • Wenjie Luo
  • George Tucker
  • Eli Bronstein
  • Yiren Lu
  • Jean Harb
  • Xinlei Pan

Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of multi-agent interactive behaviors to be trustworthy, behaviors which can be highly nuanced and complex. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e. g. , the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents.

AAMAS Conference 2022 Conference Paper

Characterizing Attacks on Deep Reinforcement Learning

  • Xinlei Pan
  • Chaowei Xiao
  • Warren He
  • Shuang Yang
  • Jian Peng
  • Mingjie Sun
  • Mingyan Liu
  • Bo Li

Recent studies show that Deep Reinforcement Learning (DRL) models are vulnerable to adversarial attacks, which attack DRL models by adding small perturbations to the observations. However, some attacks assume full availability of the victim model, and some require a huge amount of computation, making them less feasible for real world applications. In this work, we make further explorations of the vulnerabilities of DRL by studying other aspects of attacks on DRL using realistic and e�cient attacks. First, we adapt and propose e�cient black-box attacks when we do not have access to DRL model parameters. Second, to address the high computational demands of existing attacks, we introduce e�cient online sequential attacks that exploit temporal consistency across consecutive steps. Third, we explore the possibility of an attacker perturbing other aspects in the DRL setting, such as the environment dynamics. Finally, to account for imperfections in how an attacker would inject perturbations in the physical world, we devise a method for generating a robust physical perturbations to be printed. The attack is evaluated on a real-world robot under various conditions. We conduct extensive experiments both in simulation such as Atari games, robotics and autonomous driving, and on real-world robotics, to compare the e�ectiveness of the proposed attacks with baseline approaches. To the best of our knowledge, we are the�rst to apply adversarial attacks on DRL systems to physical robots.

ICRA Conference 2021 Conference Paper

Emergent Hand Morphology and Control from Optimizing Robust Grasps of Diverse Objects

  • Xinlei Pan
  • Animesh Garg
  • Anima Anandkumar
  • Yuke Zhu

Evolution in nature illustrates that the creatures’ biological structure and their sensorimotor skills adapt to the environmental changes for survival. Likewise, the ability to morph and acquire new skills can facilitate an embodied agent to solve tasks of varying complexities. In this work, we introduce a data-driven approach where effective hand designs naturally emerge for the purpose of grasping diverse objects. Jointly optimizing morphology and control imposes computational challenges since it requires constant evaluation of a black-box function that measures the performance of a combination of embodiment and behavior. We develop a novel Bayesian Optimization algorithm that efficiently co-designs the morphology and grasping skills through learned latent-space representations. We design the grasping tasks based on a taxonomy of human grasp types: power grasp, pinch grasp, and lateral grasp. Through experimentation and comparative study, we demonstrate that our approach discovers robust and cost-efficient hand morphologies for grasping novel objects. Additional videos and results at https://xinleipan.github.io/emergent_morphology

ICRA Conference 2020 Conference Paper

Zero-shot Imitation Learning from Demonstrations for Legged Robot Visual Navigation

  • Xinlei Pan
  • Tingnan Zhang
  • Brian Ichter
  • Aleksandra Faust
  • Jie Tan 0001
  • Sehoon Ha

Imitation learning is a popular approach for training effective visual navigation policies. However, collecting expert demonstrations for legged robots is challenging as these robots can be hard to control, move slowly, and cannot operate continuously for long periods of time. In this work, we propose a zero-shot imitation learning framework for training a goal-driven visual navigation policy on a legged robot from human demonstrations (third-person perspective), allowing for high-quality navigation and cost-effective data collection. However, imitation learning from third-person demonstrations raises unique challenges. First, these demonstrations are captured from different camera perspectives, which we address via a feature disentanglement network (FDN) that extracts perspective-invariant state features. Second, as transition dynamics vary between systems, we reconstruct missing action labels by either building an inverse model of the robot's dynamics in the feature space and applying it to the human demonstrations or developing a Graphic User Interface (GUI) to label human demonstrations. To train a navigation policy we use a model-based imitation learning approach with FDN and action-labeled human demonstrations. We show that our framework can learn an effective policy for a legged robot, Laikago, from human demonstrations in both simulated and real-world environments. Our approach is zero-shot as the robot never navigates the same paths during training as those at testing time. We justify our framework by performing a comparative study.

AAMAS Conference 2019 Conference Paper

How You Act Tells a Lot: Privacy-Leaking Attack on Deep Reinforcement Learning

  • Xinlei Pan
  • Weiyao Wang
  • Xiaoshuai Zhang
  • Bo Li
  • Jinfeng Yi
  • Dawn Song

Machine learning has been widely applied to various applications, some of which involve training with privacy-sensitive data. A modest number of data breaches have been studied, including credit card information in natural language data and identities from face dataset. However, most of these studies focus on supervised learning models. As deep reinforcement learning (DRL) has been deployed in a number of real-world systems, such as indoor robot navigation, whether trained DRL policies can leak private information requires in-depth study. To explore such privacy breaches in general, we mainly propose two methods: environment dynamics search via genetic algorithm and candidate inference based on shadow policies. We conduct extensive experiments to demonstrate such privacy vulnerabilities in DRL under various settings. We leverage the proposed algorithms to infer floor plans from some trained Grid World navigation DRL agents with LiDAR perception. The proposed algorithm can correctly infer most of the floor plans and reaches an average recovery rate of 95. 83% using policy gradient trained agents. In addition, we are able to recover the robot configuration in continuous control environments and an autonomous driving simulator with high accuracy. To the best of our knowledge, this is the first work to investigate privacy leakage in DRL settings and we show that DRL-based agents do potentially leak privacy-sensitive information from the trained policies.

ICRA Conference 2019 Conference Paper

Risk Averse Robust Adversarial Reinforcement Learning

  • Xinlei Pan
  • Daniel Seita
  • Yang Gao
  • John F. Canny

Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learning (RARL) was developed, which allows efficient applications of random and systematic perturbations by a trained adversary. A limitation of RARL is that only the expected control objective is optimized; there is no explicit modeling or optimization of risk. Thus the agents do not consider the probability of catastrophic events (i. e. , those inducing abnormally large negative reward), except through their effect on the expected objective. In this paper we introduce risk-averse robust adversarial reinforcement learning (RARARL), using a risk-averse protagonist and a risk-seeking adversary. We test our approach on a self-driving vehicle controller. We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a risk-averse agent is better equipped to handle a risk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary. Supplementary materials are available at https://sites.google.com/view/rararl.

ICRA Conference 2019 Conference Paper

Semantic Predictive Control for Explainable and Efficient Policy Learning

  • Xinlei Pan
  • Xiangyu Chen
  • Qi-Zhi Cai
  • John F. Canny
  • Fisher Yu 0001

Visual anticipation of ego and object motion over a short time horizons is a key feature of human-level performance in complex environments. We propose a driving policy learning framework that predicts feature representations of future visual inputs; our predictive model infers not only future events but also semantics, which provide a visual explanation of policy decisions. Our Semantic Predictive Control (SPC) framework predicts future semantic segmentation and events by aggregating multi-scale feature maps. A guidance model assists action selection and enables efficient sampling-based optimization. Experiments on multiple simulation environments show that networks which implement SPC can outperform existing model-based reinforcement learning algorithms in terms of data efficiency and total rewards while providing clear explanations for the policy's behavior.

IJCAI Conference 2018 Conference Paper

An Efficient Minibatch Acceptance Test for Metropolis-Hastings

  • Daniel Seita
  • Xinlei Pan
  • Haoyu Chen
  • John Canny

We present a novel Metropolis-Hastings method for large datasets that uses small expected-size mini-batches of data. Previous work on reducing the cost of Metropolis-Hastings tests yields only constant factor reductions versus using the full dataset for each sample. Here we present a method that can be tuned to provide arbitrarily small batch sizes, by adjusting either proposal step size or temperature. Our test uses the noise-tolerant Barker acceptance test with a novel additive correction variable. The resulting test has similar cost to a normal SGD update. Our experiments demonstrate several order-of-magnitude speedups over previous work.

AAMAS Conference 2018 Conference Paper

Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning

  • Xinlei Pan
  • Yilin Shen

Humans are able to understand and perform complex tasks by strategically structuring tasks into incremental steps or sub-goals. For a robot attempting to learn to perform a sequential task with critical subgoal states, these subgoal states can provide a natural opportunity for interaction with a human expert. This paper analyzes the benefit of incorporating a notion of subgoals into Inverse Reinforcement Learning (IRL) with a Human-In-The-Loop (HITL) framework. The learning process is interactive, with a human expert first providing input in the form of full demonstrations along with some subgoal states. These subgoal states defines a set of sub-tasks for the learning agent to complete in order to achieve the final goal. The learning agent queries for partial demonstrations corresponding to each sub-task as needed when the learning agent struggles with individual sub-task. The proposed Human Interactive IRL (HI-IRL) framework is evaluated on several discrete path-planning tasks. We demonstrate that subgoal-based interactive structuring of the learning task results in significantly more efficient learning, requiring only a fraction of the demonstration data needed for learning the underlying reward function with a baseline IRL model.

UAI Conference 2017 Conference Paper

An Efficient Minibatch Acceptance Test for Metropolis-Hastings

  • Daniel Seita
  • Xinlei Pan
  • Haoyu Chen
  • John F. Canny

We present a novel Metropolis-Hastings method for large datasets that uses small expected-size minibatches of data. Previous work on reducing the cost of MetropolisHastings tests yield variable data consumed per sample, with only constant factor reductions versus using the full dataset for each sample. Here we present a method that can be tuned to provide arbitrarily small batch sizes, by adjusting either proposal step size or temperature. Our test uses the noise-tolerant Barker acceptance test with a novel additive correction variable. The resulting test has similar cost to a normal SGD update. Our experiments demonstrate several order-of-magnitude speedups over previous work.