Author name cluster

Rowan McAllister

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

NeurIPS Conference 2023 Conference Paper

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

Cole Gulino
Justin Fu
Wenjie Luo
George Tucker
Eli Bronstein
Yiren Lu
Jean Harb
Xinlei Pan

Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of multi-agent interactive behaviors to be trustworthy, behaviors which can be highly nuanced and complex. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e. g. , the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents.

PDF Details

ICRA Conference 2022 Conference Paper

Control-Aware Prediction Objectives for Autonomous Driving

Rowan McAllister
Blake Wulfe
Jean Mercat
Logan Ellis
Sergey Levine
Adrien Gaidon

Autonomous vehicle software is typically structured as a modular pipeline of individual components (e. g. , perception, prediction, and planning) to help separate concerns into interpretable sub-tasks. Even when end-to-end training is possible, each module has its own set of objectives used for safety assurance, sample efficiency, regularization, or interpretability. However, intermediate objectives do not always align with overall system performance. For example, optimizing the likelihood of a trajectory prediction module might focus more on easy-to-predict agents than safety-critical or rare behaviors (e. g. , jaywalking). In this paper, we present control-aware prediction objectives (CAPOs), to evaluate the down-stream effect of predictions on control without requiring the planner be differentiable. We propose two types of importance weights that weight the predictive likelihood: one using an attention model between agents, and another based on control variation when exchanging predicted trajectories for ground truth trajectories. Experimentally, we show our objectives improve overall system performance in suburban driving scenarios using the CARLA simulator.

Details

ICLR Conference 2022 Conference Paper

Dynamics-Aware Comparison of Learned Reward Functions

Blake Wulfe
Logan Ellis
Jean Mercat
Rowan McAllister
Adrien Gaidon

The ability to learn reward functions plays an important role in enabling the deployment of intelligent agents in the real world. However, $\textit{comparing}$ reward functions, for example as a means of evaluating reward learning methods, presents a challenge. Reward functions are typically compared by considering the behavior of optimized policies, but this approach conflates deficiencies in the reward function with those of the policy search algorithm used to optimize it. To address this challenge, Gleave et al. (2020) propose the Equivalent-Policy Invariant Comparison (EPIC) distance. EPIC avoids policy optimization, but in doing so requires computing reward values at transitions that may be impossible under the system dynamics. This is problematic for learned reward functions because it entails evaluating them outside of their training distribution, resulting in inaccurate reward values that we show can render EPIC ineffective at comparing rewards. To address this problem, we propose the Dynamics-Aware Reward Distance (DARD), a new reward pseudometric. DARD uses an approximate transition model of the environment to transform reward functions into a form that allows for comparisons that are invariant to reward shaping while only evaluating reward functions on transitions close to their training distribution. Experiments in simulated physical domains demonstrate that DARD enables reliable reward comparisons without policy optimization and is significantly more predictive than baseline methods of downstream policy performance when dealing with learned reward functions.

Details

IROS Conference 2022 Conference Paper

Heterogeneous-Agent Trajectory Forecasting Incorporating Class Uncertainty

Boris Ivanovic
Kuan-Hui Lee
Pavel Tokmakov
Blake Wulfe
Rowan McAllister
Adrien Gaidon
Marco Pavone 0001

Reasoning about the future behavior of other agents is critical to safe robot navigation. The multiplicity of plausible futures is further amplified by the uncertainty inherent to agent state estimation from data, including positions, velocities, and semantic class. Forecasting methods, however, typically neglect class uncertainty, conditioning instead only on the agent's most likely class, even though perception models often return full class distributions. To exploit this information, we present HAICU, a method for heterogeneous-agent trajectory forecasting that explicitly incorporates agents' class probabilities. We additionally present PUP, a new challenging real-world autonomous driving dataset, to investigate the im-pact of Perceptual Uncertainty in Prediction. It contains chal-lenging crowded scenes with unfiltered agent class probabilities that reflect the long-tail of current state-of-the-art perception systems. We demonstrate that incorporating class probabilities in trajectory forecasting significantly improves performance in the face of uncertainty, and enables new forecasting capabilities such as counterfactual predictions.

Details

ICRA Conference 2021 Conference Paper

Contingencies from Observations: Tractable Contingency Planning with Learned Behavior Models

Nicholas Rhinehart
Jeff He
Charles Packer
Matthew A. Wright
Rowan McAllister
Joseph E. Gonzalez
Sergey Levine

Humans have a remarkable ability to accurately reason about future events, including the behaviors and states of mind of other agents. Consider driving a car through a busy intersection: it is necessary to reason about the physics of the vehicle, the intentions of other drivers, and their beliefs about your own intentions. For example, if you signal a turn, another driver might yield to you; or if you enter the passing lane, another driver might decelerate to give you room to merge in front. Competent drivers must plan how they can safely react to a variety of potential future behaviors of other agents before they make their next move. This requires contingency planning: explicitly planning a set of conditional actions that depend on the stochastic outcome of future events. In this work, we develop a general-purpose contingency planner that is learned end-to-end using high-dimensional scene observations and low-dimensional behavioral observations. We use a conditional autoregressive flow model for contingency planning. We show how this model can tractably learn contingencies from behavioral observations. We developed a closed-loop control benchmark of realistic multi-agent scenarios in a driving simulator (CARLA), on which we compare our method to various noncontingent methods that reason about multi-agent future behavior, and find that our contingency planning method achieves qualitatively and quantitatively superior performance.

Details

ICLR Conference 2021 Conference Paper

Learning Invariant Representations for Reinforcement Learning without Reconstruction

Amy Zhang 0001
Rowan McAllister
Roberto Calandra
Yarin Gal
Sergey Levine

We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction. Our goal is to learn representations that provide for effective downstream control and invariance to task-irrelevant details. Bisimulation metrics quantify behavioral similarity between states in continuous MDPs, which we propose using to learn robust latent representations which encode only the task-relevant information from observations. Our method trains encoders such that distances in latent space equal bisimulation distances in state space. We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks, where the background is replaced with moving distractors and natural videos, while achieving SOTA performance. We also test a first-person highway driving task where our method learns invariance to clouds, weather, and time of day. Finally, we provide generalization results drawn from properties of bisimulation metrics, and links to causal inference.

Details

NeurIPS Conference 2021 Conference Paper

Outcome-Driven Reinforcement Learning via Variational Inference

Tim G. J. Rudner
Vitchyr Pong
Rowan McAllister
Yarin Gal
Sergey Levine

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the task, but also provide sufficient shaping to accomplish it. In this paper, we view reinforcement learning as inferring policies that achieve desired outcomes, rather than as a problem of maximizing rewards. To solve this inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function which can be learned directly from environment interactions. From the corresponding variational objective, we also derive a new probabilistic Bellman backup operator and use it to develop an off-policy algorithm to solve goal-directed tasks. We empirically demonstrate that this method eliminates the need to hand-craft reward functions for a suite of diverse manipulation and locomotion tasks and leads to effective goal-directed behaviors.

PDF Details

ICML Conference 2020 Conference Paper

Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts?

Angelos Filos
Panagiotis Tigas
Rowan McAllister
Nicholas Rhinehart
Sergey Levine
Yarin Gal

Out-of-training-distribution (OOD) scenarios are a common challenge of learning agents at deployment, typically leading to arbitrary deductions and poorly-informed decisions. In principle, detection of and adaptation to OOD scenes can mitigate their adverse effects. In this paper, we highlight the limitations of current approaches to novel driving scenes and propose an epistemic uncertainty-aware planning method, called \emph{robust imitative planning} (RIP). Our method can detect and recover from some distribution shifts, reducing the overconfident and catastrophic extrapolations in OOD scenes. If the model’s uncertainty is too great to suggest a safe course of action, the model can instead query the expert driver for feedback, enabling sample-efficient online adaptation, a variant of our method we term \emph{adaptive robust imitative planning} (AdaRIP). Our methods outperform current state-of-the-art approaches in the nuScenes \emph{prediction} challenge, but since no benchmark evaluating OOD detection and adaption currently exists to assess \emph{control}, we introduce an autonomous car novel-scene benchmark, \texttt{CARNOVEL}, to evaluate the robustness of driving agents to a suite of tasks with distribution shifts, where our methods outperform all the baselines.

Details

ICLR Conference 2020 Conference Paper

Deep Imitative Models for Flexible Inference, Planning, and Control

Nicholas Rhinehart
Rowan McAllister
Sergey Levine

Imitation Learning (IL) is an appealing approach to learn desirable autonomous behavior. However, directing IL to achieve arbitrary goals is difficult. In contrast, planning-based algorithms use dynamics models and reward functions to achieve goals. Yet, reward functions that evoke desirable behavior are often difficult to specify. In this paper, we propose "Imitative Models" to combine the benefits of IL and goal-directed planning. Imitative Models are probabilistic predictive models of desirable behavior able to plan interpretable expert-like trajectories to achieve specified goals. We derive families of flexible goal objectives, including constrained goal regions, unconstrained goal sets, and energy-based goals. We show that our method can use these objectives to successfully direct behavior. Our method substantially outperforms six IL approaches and a planning-based approach in a dynamic simulated autonomous driving task, and is efficiently learned from expert demonstrations without online data collection. We also show our approach is robust to poorly-specified goals, such as goals on the wrong side of the road.

Details

ICRA Conference 2019 Conference Paper

Robustness to Out-of-Distribution Inputs via Task-Aware Generative Uncertainty

Rowan McAllister
Gregory Kahn
Jeff Clune
Sergey Levine

Deep learning provides a powerful tool for robotic perception in the open world. However, real-world robotic systems, especially mobile robots, must be able to react intelligently and safely even in unexpected circumstances. This requires a system that knows what it knows, and can estimate its own uncertainty for unfamiliar, out-of-distribution observations. Approximate Bayesian approaches are commonly used to estimate uncertainty for neural network predictions, but struggle with out-of-distribution observations. Generative models can in principle detect out-of-distribution observations as those with a low estimated density, but overly pessimistic as an uncertainty measure, since the mere presence of an out-of-distribution input does not by itself indicate an unsafe situation. Intuitively, we would like a perception system that can detect when task-salient parts of the image are unfamiliar or uncertain, while ignoring task-irrelevant features. In this paper, we present a method for uncertainty-aware robotic perception that combines generative modeling and model uncertainty. Our method estimates an uncertainty measure about the model's prediction, taking into account an explicit generative model of the observation distribution to handle out-of-distribution inputs. We evaluate our method on an action-conditioned collision prediction task with both simulated and real data, and demonstrate that our approach improves on a variety of Bayesian neural network techniques.

Details

NeurIPS Conference 2018 Conference Paper

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Kurtland Chua
Roberto Calandra
Rowan McAllister
Sergey Levine

Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance. This is especially true with high-capacity parametric function approximators, such as deep networks. In this paper, we study how to bridge this gap, by employing uncertainty-aware dynamics models. We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Our comparison to state-of-the-art model-based and model-free deep RL algorithms shows that our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e. g. 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task).

PDF Details

IJCAI Conference 2017 Conference Paper

Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning

Rowan McAllister
Yarin Gal
Alex Kendall
Mark van der Wilk
Amar Shah
Roberto Cipolla
Adrian Weller

Autonomous vehicle (AV) software is typically composed of a pipeline of individual components, linking sensor inputs to motor outputs. Erroneous component outputs propagate downstream, hence safe AV software must consider the ultimate effect of each component’s errors. Further, improving safety alone is not sufficient. Passengers must also feel safe to trust and use AV systems. To address such concerns, we investigate three under-explored themes for AV research: safety, interpretability, and compliance. Safety can be improved by quantifying the uncertainties of component outputs and propagating them forward through the pipeline. Interpretability is concerned with explaining what the AV observes and why it makes the decisions it does, building reassurance with the passenger. Compliance refers to maintaining some control for the passenger. We discuss open challenges for research within these themes. We highlight the need for concrete evaluation metrics, propose example problems, and highlight possible solutions.

PDF Details

NeurIPS Conference 2017 Conference Paper

Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs

Rowan McAllister
Carl Edward Rasmussen

We present a data-efficient reinforcement learning method for continuous state-action systems under significant observation noise. Data-efficient solutions under small noise exist, such as PILCO which learns the cartpole swing-up task in 30s. PILCO evaluates policies by planning state-trajectories using a dynamics model. However, PILCO applies policies to the observed state, therefore planning in observation space. We extend PILCO with filtering to instead plan in belief space, consistent with partially observable Markov decisions process (POMDP) planning. This enables data-efficient learning under significant observation noise, outperforming more naive methods such as post-hoc application of a filter to policies optimised by the original (unfiltered) PILCO algorithm. We test our method on the cartpole swing-up task, which involves nonlinear dynamics and requires nonlinear control.

PDF Details

IROS Conference 2012 Conference Paper

Motion planning and stochastic control with experimental validation on a planetary rover

Rowan McAllister
Thierry Peynot
Robert Fitch
Salah Sukkarieh

Motion planning for planetary rovers must consider control uncertainty in order to maintain the safety of the platform during navigation. Modelling such control uncertainty is difficult due to the complex interaction between the platform and its environment. In this paper, we propose a motion planning approach whereby the outcome of control actions is learned from experience and represented statistically using a Gaussian process regression model. This model is used to construct a control policy for navigation to a goal region in a terrain map built using an on-board RGB-D camera. The terrain includes flat ground, small rocks, and non-traversable rocks. We report the results of 200 simulated and 35 experimental trials that validate the approach and demonstrate the value of considering control uncertainty in maintaining platform safety.

Details