Author name cluster

Alexander Amini

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers

2 author rows

ICLR Conference 2025 Conference Paper

STAR: Synthesis of Tailored Architectures

Armin W. Thomas
Rom N. Parnichkun
Alexander Amini
Stefano Massaroli
Michael Poli

Iterative improvement of model architectures is fundamental to deep learning: Transformers first enabled scaling, and recent advances in model hybridization have pushed the quality-efficiency frontier. However, optimizing architectures remains challenging and expensive, with a variety of automated or manual approaches that fall short, due to limited progress in the design of search spaces and due to the simplicity of resulting patterns and heuristics. In this work, we propose a new approach for the synthesis of tailored architectures (STAR). Our approach combines a novel search space based on the theory of linear input-varying systems, supporting a hierarchical numerical encoding into architecture genomes. STAR genomes are automatically refined and recombined with gradient-free, evolutionary algorithms to optimize for multiple model quality and efficiency metrics. Using STAR, we optimize large populations of new architectures, leveraging diverse computational units and interconnection patterns, improving over highly-optimized Transformers and striped hybrid models on the frontier of quality, parameter size, and inference cache for autoregressive language modeling.

ICRA Conference 2024 Conference Paper

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Tsun-Hsuan Wang
Alaa Maalouf
Wei Xiao 0003
Yutong Ban
Alexander Amini
Guy Rosman
Sertac Karaman
Daniela Rus

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems. We introduce a method to extract nuanced spatial features from transformers and the incorporation of latent space simulation for improved training and policy debugging. We use pixel/patch-aligned feature descriptors to expand foundational model capabilities to create an end-to-end multimodal driving model, demonstrating unparalleled results in diverse tests. Our solution combines language with visual perception and achieves significantly greater robustness on out-of-distribution situations.

IROS Conference 2024 Conference Paper

Exploring Latent Pathways: Enhancing the Interpretability of Autonomous Driving with a Variational Autoencoder

Anass Bairouk
Mirjana Maras
Simon Herlin
Alexander Amini
Marc Blanchon
Ramin M. Hasani
Patrick Chareyre
Daniela Rus

Autonomous driving presents a complex challenge, which is usually addressed with artificial intelligence models that are end-to-end or modular in nature. Within the landscape of modular approaches, a bio-inspired neural circuit policy model has emerged as an innovative control module, offering a compact and inherently interpretable system to infer a steering wheel command from abstract visual features. Here, we take a leap forward by integrating a variational autoencoder with the neural circuit policy controller, forming a solution that directly generates steering commands from input camera images. By substituting the traditional convolutional neural network approach to feature extraction with a variational autoencoder, we enhance the system’s interpretability, enabling a more transparent and understandable decision-making process. In addition to the architectural shift toward a variational autoencoder, this study introduces the automatic latent perturbation tool, a novel contribution designed to probe and elucidate the latent features within the variational autoencoder. The automatic latent perturbation tool automates the interpretability process, offering granular insights into how specific latent variables influence the overall model’s behavior. Through a series of numerical experiments, we demonstrate the interpretative power of the variational autoencoder-neural circuit policy model and the utility of the automatic latent perturbation tool in making the inner workings of autonomous driving systems more transparent.

ICML Conference 2024 Conference Paper

Large Scale Dataset Distillation with Domain Shift

Noel Loo
Alaa Maalouf
Ramin M. Hasani
Mathias Lechner
Alexander Amini
Daniela Rus

Dataset Distillation seeks to summarize a large dataset by generating a reduced set of synthetic samples. While there has been much success at distilling small datasets such as CIFAR-10 on smaller neural architectures, Dataset Distillation methods fail to scale to larger high-resolution datasets and architectures. In this work, we introduce D ataset D istillation with D omain S hift ( D3S ), a scalable distillation algorithm, made by reframing the dataset distillation problem as a domain shift one. In doing so, we derive a universal bound on the distillation loss, and provide a method for efficiently approximately optimizing it. We achieve state-of-the-art results on Tiny-ImageNet, ImageNet-1k, and ImageNet-21K over a variety of recently proposed baselines, including high cross-architecture generalization. Additionally, our ablation studies provide lessons on the importance of validation-time hyperparameters on distillation performance, motivating the need for standardization.

ICRA Conference 2024 Conference Paper

Overparametrization helps offline-to-online generalization of closed-loop control from pixels

Mathias Lechner
Ramin M. Hasani
Alexander Amini
Tsun-Hsuan Wang
Thomas A. Henzinger
Daniela Rus

There is an ever-growing zoo of modern neural network models that can efficiently learn end-to-end control from visual observations. These advanced deep models, ranging from convolutional to Vision Transformers, from small to gigantic networks, have been extensively tested on offline image classification tasks. In this paper, we study these vision models with respect to the open-loop training to closed-loop generalization abilities, i. e. , deployment realizes a causal feedback loop that is not present during training. This causality gap typically emerges in robotics applications such as autonomous driving, where a network is trained to imitate the control commands of a human. In this setting, two situations arise: 1) Closed-loop testing in-distribution, where the test environment shares properties with those of offline training data. 2) Closed-loop testing under distribution shifts and out-of-distribution. Contrary to recently reported results, we show that under proper training guidelines, all vision architectures perform indistinguishably well on in-distribution deployment, resolving the causality gap. In situation 2, We observe that scale is the strongest factor in improving closed-loop generalization regardless of the choice of the model architecture. Our results predict the trend that in the future we will see larger and larger models being used in offline-training-online-deployment imitation learning tasks in robotic applications.

ICLR Conference 2024 Conference Paper

Understanding Reconstruction Attacks with the Neural Tangent Kernel and Dataset Distillation

Noel Loo
Ramin M. Hasani
Mathias Lechner
Alexander Amini
Daniela Rus

Modern deep learning requires large volumes of data, which could contain sensitive or private information that cannot be leaked. Recent work has shown for homogeneous neural networks a large portion of this training data could be reconstructed with only access to the trained network parameters. While the attack was shown to work empirically, there exists little formal understanding of its effective regime and which datapoints are susceptible to reconstruction. In this work, we first build a stronger version of the dataset reconstruction attack and show how it can provably recover the \emph{entire training set} in the infinite width regime. We then empirically study the characteristics of this attack on two-layer networks and reveal that its success heavily depends on deviations from the frozen infinite-width Neural Tangent Kernel limit. Next, we study the nature of easily-reconstructed images. We show that both theoretically and empirically, reconstructed images tend to ``outliers'' in the dataset, and that these reconstruction attacks can be used for \textit{dataset distillation}, that is, we can retrain on reconstructed images and obtain high predictive accuracy.

ICRA Conference 2023 Conference Paper

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Zhijian Liu
Haotian Tang
Alexander Amini
Xinyu Yang 0002
Huizi Mao
Daniela Rus
Song Han 0003

Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semantic density of camera features, hindering the effectiveness of such methods, especially for semantic-oriented tasks (such as 3D scene segmentation). In this paper, we propose BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. To achieve this, we diagnose and lift the key efficiency bottlenecks in the view transformation with optimized BEV pooling, reducing latency by more than $\mathbf{40}\times$. BEVFusion is fundamentally task-agnostic and seamlessly supports different 3D perception tasks with almost no architectural changes. It establishes the new state of the art on the nuScenes benchmark, achieving 1. 3% higher mAP and NDS on 3D object detection and 13. 6% higher mIoU on BEV map segmentation, with 1. 9× lower computation cost. Code to reproduce our results is available at https://github.com/mit-han-lab/bevfusion.

ICLR Conference 2023 Conference Paper

Liquid Structural State-Space Models

Ramin M. Hasani
Mathias Lechner
Tsun-Hsuan Wang
Makram Chahine
Alexander Amini
Daniela Rus

A proper parametrization of state transition matrices of linear state-space models (SSMs) followed by standard nonlinearities enables them to efficiently learn representations from sequential data, establishing the state-of-the-art on an extensive series of long-range sequence modeling benchmarks. In this paper, we show that we can improve further when the structured SSM, such as S4, is given by a linear liquid time-constant (LTC) state-space model. LTC neural networks are causal continuous-time neural networks with an input-dependent state transition module, which makes them learn to adapt to incoming inputs at inference. We show that by using a diagonal plus low-rank decomposition of the state transition matrix introduced in S4, and a few simplifications, the LTC-based structured state-space model, dubbed Liquid-S4, improves generalization across sequence modeling tasks with long-term dependencies such as image, text, audio, and medical time-series, with an average performance of 87.32\% on the Long-Range Arena benchmark. On the full raw Speech Command recognition dataset, Liquid-S4 achieves 96.78\% accuracy with a 30\% reduction in parameter counts compared to S4. The additional gain in performance is the direct result of the Liquid-S4's kernel structure that takes into account the similarities of the input sequence samples during training and inference.

AAMAS Conference 2022 Conference Paper

Autonomous Flight Arcade Challenge: Single- and Multi-Agent Learning Environments for Aerial Vehicles

Paul Tylkin
Tsun-Hsuan Wang
Tim Seyde
Kyle Palko
Ross Allen
Alexander Amini
Daniela Rus

The Autonomous Flight Arcade (AFA) is a novel suite of singleand multi-agent learning environments for control of aerial vehicles. These environments incorporate realistic physics using the Unity game engine with diverse objectives and levels of decisionmaking sophistication. In addition to the environments themselves, we introduce an interface for interacting with them, including the ability to vary key parameters, thereby both changing the difficulty and the core challenges. We also introduce a pipeline for collecting human gameplay within the environments. We demonstrate the performance of artificial agents in these environments trained using deep reinforcement learning, and also motivate these environments as a benchmark for designing non-learned classical control policies and agents trained using imitation learning from human demonstrations. Finally, we motivate the use of AFA environments as a testbed for training artificial agents capable of cooperative human-AI decision making, including parallel autonomy.

NeurIPS Conference 2022 Conference Paper

Efficient Dataset Distillation using Random Feature Approximation

Noel Loo
Ramin Hasani
Alexander Amini
Daniela Rus

Dataset distillation compresses large datasets into smaller synthetic coresets which retain performance with the aim of reducing the storage and computational burden of processing the entire dataset. Today's best performing algorithm, \textit{Kernel Inducing Points} (KIP), which makes use of the correspondence between infinite-width neural networks and kernel-ridge regression, is prohibitively slow due to the exact computation of the neural tangent kernel matrix, scaling $O(|S|^2)$, with $|S|$ being the coreset size. To improve this, we propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel which reduces the kernel matrix computation to $O(|S|)$. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets, both in kernel regression and finite-width network training. We demonstrate the effectiveness of our approach on tasks involving model interpretability and privacy preservation.

NeurIPS Conference 2022 Conference Paper

Evolution of Neural Tangent Kernels under Benign and Adversarial Training

Noel Loo
Ramin Hasani
Alexander Amini
Daniela Rus

Two key challenges facing modern deep learning is mitigating deep networks vulnerability to adversarial attacks, and understanding deep learning's generalization capabilities. Towards the first issue, many defense strategies have been developed, with the most common being Adversarial Training (AT). Towards the second challenge, one of the dominant theories that has emerged is the Neural Tangent Kernel (NTK) -- a characterization of neural network behavior in the infinite-width limit. In this limit, the kernel is frozen and the underlying feature map is fixed. In finite-widths however, there is evidence that feature learning happens at the earlier stages of the training (kernel learning) before a second phase where the kernel remains fixed (lazy training). While prior work has aimed at studying adversarial vulnerability through the lens of the frozen infinite-width NTK, there is no work which studies adversarial robustness of NTK during training. In this work, we perform an empirical study of the evolution of the NTK under standard and adversarial training, aiming to disambiguate the effect of adversarial training on kernel learning and lazy training. We find under adversarial training, the NTK rapidly converges to a different kernel (and feature map) than standard training. This new kernel provides adversarial robustness, even when non-robust training is performed on top of it. Furthermore, we find that adversarial training on top of a fixed kernel can yield a classifier with $76. 1\%$ robust accuracy under PGD attacks with $\varepsilon = 4/255$ on CIFAR-10.

ICRA Conference 2022 Conference Paper

Learning Interactive Driving Policies via Data-driven Simulation

Tsun-Hsuan Wang
Alexander Amini
Wilko Schwarting
Igor Gilitschenski
Sertac Karaman
Daniela Rus

Data-driven simulators promise high data-efficiency for driving policy learning. When used for modelling interactions, this data-efficiency becomes a bottleneck: small underlying datasets often lack interesting and challenging edge cases for learning interactive driving. We address this challenge by proposing a data-driven simulation engine† that uses inpainted ado vehicles for learning robust driving policies. Thus, our approach can be used to learn policies that involve multi-agent interactions and allows for training via state-of-the-art policy learning methods. We evaluate the approach for learning standard interaction scenarios in driving. In extensive experiments, our work demonstrates that the resulting policies can be directly transferred to a full-scale autonomous vehicle without making use of any traditional sim-to-real transfer techniques such as domain randomization.

ICRA Conference 2022 Conference Paper

VISTA 2. 0: An Open, Data-driven Simulator for Multimodal Sensing and Policy Learning for Autonomous Vehicles

Alexander Amini
Tsun-Hsuan Wang
Igor Gilitschenski
Wilko Schwarting
Zhijian Liu
Song Han 0003
Sertac Karaman
Daniela Rus

Simulation has the potential to transform the development of robust algorithms for mobile agents deployed in safety-critical scenarios. However, the poor photorealism and lack of diverse sensor modalities of existing simulation engines remain key hurdles towards realizing this potential. Here, we present VISTA † † Full code release for the VISTA data-driven simulation engine is available here: vista. csail. mit.edu. , an open source, data-driven simulator that integrates multiple types of sensors for autonomous vehicles. Using high fidelity, real-world datasets, VISTA represents and simulates RGB cameras, 3D LiDAR, and event-based cameras, enabling the rapid generation of novel viewpoints in simulation and thereby enriching the data available for policy learning with corner cases that are difficult to capture in the physical world. Using VISTA, we demonstrate the ability to train and test perception-to-control policies across each of the sensor types and showcase the power of this approach via deployment on a full scale autonomous vehicle. The policies learned in VISTA exhibit sim-to-real transfer without modification and greater robustness than those trained exclusively on real-world data.

NeurIPS Conference 2021 Conference Paper

Causal Navigation by Continuous-time Neural Networks

Charles Vorbach
Ramin Hasani
Alexander Amini
Mathias Lechner
Daniela Rus

Imitation learning enables high-fidelity, vision-based learning of policies within rich, photorealistic environments. However, such techniques often rely on traditional discrete-time neural models and face difficulties in generalizing to domain shifts by failing to account for the causal relationships between the agent and the environment. In this paper, we propose a theoretical and experimental framework for learning causal representations using continuous-time neural networks, specifically over their discrete-time counterparts. We evaluate our method in the context of visual-control learning of drones over a series of complex tasks, ranging from short- and long-term navigation, to chasing static and dynamic objects through photorealistic environments. Our results demonstrate that causal continuous-time deep models can perform robust navigation tasks, where advanced recurrent models fail. These models learn complex causal control representations directly from raw visual inputs and scale to solve a variety of tasks using imitation learning.

ICRA Conference 2021 Conference Paper

Efficient and Robust LiDAR-Based End-to-End Navigation

Zhijian Liu
Alexander Amini
Sibo Zhu
Sertac Karaman
Song Han 0003
Daniela Rus

Deep learning has been used to demonstrate end-to-end neural network learning for autonomous vehicle control from raw sensory input. While LiDAR sensors provide reliably accurate information, existing end-to-end driving solutions are mainly based on cameras since processing 3D data requires a large memory footprint and computation cost. On the other hand, increasing the robustness of these systems is also critical; however, even estimating the model’s uncertainty is very challenging due to the cost of sampling-based methods. In this paper, we present an efficient and robust LiDAR-based end-to-end navigation framework. We first introduce Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design. We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass and then fuses the control predictions intelligently. We evaluate our system on a full-scale vehicle and demonstrate lane-stable as well as navigation capabilities. In the presence of out-of-distribution events (e. g. , sensor failures), our system significantly improves robustness and reduces the number of takeovers in the real world.

AAAI Conference 2021 Conference Paper

Liquid Time-constant Networks

Ramin Hasani
Mathias Lechner
Alexander Amini
Daniela Rus
Radu Grosu

We introduce a new class of time-continuous recurrent neural network models. Instead of declaring a learning system’s dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems modulated via nonlinear interlinked gates. The resulting models represent dynamical systems with varying (i. e. , liquid) time-constants coupled to their hidden state, with outputs being computed by numerical differential equation solvers. These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations, and give rise to improved performance on time-series prediction tasks. To demonstrate these properties, we first take a theoretical approach to find bounds over their dynamics, and compute their expressive power by the trajectory length measure in a latent trajectory space. We then conduct a series of time-series prediction experiments to manifest the approximation capability of Liquid Time-Constant Networks (LTCs) compared to classical and modern RNNs.

NeurIPS Conference 2021 Conference Paper

Sparse Flows: Pruning Continuous-depth Models

Lucas Liebenwein
Ramin Hasani
Alexander Amini
Daniela Rus

Continuous deep learning architectures enable learning of flexible probabilistic models for predictive modeling as neural ordinary differential equations (ODEs), and for generative modeling as continuous normalizing flows. In this work, we design a framework to decipher the internal dynamics of these continuous depth models by pruning their network architectures. Our empirical results suggest that pruning improves generalization for neural ODEs in generative modeling. We empirically show that the improvement is because pruning helps avoid mode-collapse and flatten the loss surface. Moreover, pruning finds efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy. We hope our results will invigorate further research into the performance-size trade-offs of modern continuous-depth models.

ICML Conference 2020 Conference Paper

A Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits

Ramin M. Hasani
Mathias Lechner
Alexander Amini
Daniela Rus
Radu Grosu

We propose a neural information processing system obtained by re-purposing the function of a biological neural circuit model to govern simulated and real-world control tasks. Inspired by the structure of the nervous system of the soil-worm, C. elegans, we introduce ordinary neural circuits (ONCs), defined as the model of biological neural circuits reparameterized for the control of alternative tasks. We first demonstrate that ONCs realize networks with higher maximum flow compared to arbitrary wired networks. We then learn instances of ONCs to control a series of robotic tasks, including the autonomous parking of a real-world rover robot. For reconfiguration of the purpose of the neural circuit, we adopt a search-based optimization algorithm. Ordinary neural circuits perform on par and, in some cases, significantly surpass the performance of contemporary deep learning models. ONC networks are compact, 77% sparser than their counterpart neural controllers, and their neural dynamics are fully interpretable at the cell-level.

NeurIPS Conference 2020 Conference Paper

Deep Evidential Regression

Alexander Amini
Wilko Schwarting
Ava Soleimany
Daniela Rus

Deterministic neural networks (NNs) are increasingly being deployed in safety critical domains, where calibrated, robust, and efficient measures of uncertainty are crucial. In this paper, we propose a novel method for training non-Bayesian NNs to estimate a continuous target as well as its associated evidence in order to learn both aleatoric and epistemic uncertainty. We accomplish this by placing evidential priors over the original Gaussian likelihood function and training the NN to infer the hyperparameters of the evidential distribution. We additionally impose priors during training such that the model is regularized when its predicted evidence is not aligned with the correct output. Our method does not rely on sampling during inference or on out-of-distribution (OOD) examples for training, thus enabling efficient and scalable uncertainty learning. We demonstrate learning well-calibrated measures of uncertainty on various benchmarks, scaling to complex computer vision tasks, as well as robustness to adversarial and OOD test samples.

ICLR Conference 2020 Conference Paper

Deep Orientation Uncertainty Learning based on a Bingham Loss

Igor Gilitschenski
Roshni Sahoo
Wilko Schwarting
Alexander Amini
Sertac Karaman
Daniela Rus

Reasoning about uncertain orientations is one of the core problems in many perception tasks such as object pose estimation or motion estimation. In these scenarios, poor illumination conditions, sensor limitations, or appearance invariance may result in highly uncertain estimates. In this work, we propose a novel learning-based representation for orientation uncertainty. By characterizing uncertainty over unit quaternions with the Bingham distribution, we formulate a loss that naturally captures the antipodal symmetry of the representation. We discuss the interpretability of the learned distribution parameters and demonstrate the feasibility of our approach on several challenging real-world pose estimation tasks involving uncertain orientations.

IROS Conference 2020 Conference Paper

Uncertainty Aware Texture Classification and Mapping Using Soft Tactile Sensors

Alexander Amini
Jeffrey I. Lipton
Daniela Rus

Spatial mapping of surface roughness is a critical enabling technology for automating adaptive sanding operations. We leverage GelSight sensors to convert the problem of surface roughness measurement into a vision classification problem. By combining GelSight sensors with Optitrack positioning systems we attempt to develop an accurate spatial mapping of surface roughness that can compare to human touch, the current state of the art for large scale manufacturing. To perform the classification, we propose the use of Bayesian neural networks in conjunction with uncertainty-aware prediction. We compare the sensor and network with a human baseline for both absolute and relative texture classification. To establish a baseline, we collected performance data from humans on their ability to classify materials into 60, 120, and 180 grit sanded pine boards. Our results showed that the probabilistic network performs at the level of human touch for absolute and relative classifications. Using the Bayesian approach enables establishing a confidence bound on our prediction. We were able to integrate the sensor with Optitrack to provide a spatial map of sanding grit applied to pine boards. From this result, we can conclude that GelSight with Bayesian neural networks can learn accurate representations for sanding, and could be a significant enabling technology for closed loop robotic sanding operations.

IROS Conference 2019 Conference Paper

Infrastructure-free NLoS Obstacle Detection for Autonomous Cars

Felix Naser
Igor Gilitschenski
Alexander Amini
Christina Liao
Guy Rosman
Sertac Karaman
Daniela Rus

Current perception systems mostly require direct line of sight to anticipate and ultimately prevent potential collisions at intersections with other road users. We present a fully integrated autonomous system capable of detecting shadows or weak illumination changes on the ground caused by a dynamic obstacle in NLoS scenarios. This additional virtual sensor “ShadowCam” extends the signal range utilized so far by computer-vision ADASs. We show that (1) our algorithm maintains the mean classification accuracy of around 70% even when it doesn’t rely on infrastructure – such as AprilTags - as an image registration method. We validate (2) in real-world experiments that our autonomous car driving in night time conditions detects a hidden approaching car earlier with our virtual sensor than with the front facing 2-D LiDAR.

ICRA Conference 2019 Conference Paper

Variational End-to-End Navigation and Localization

Alexander Amini
Guy Rosman
Sertac Karaman
Daniela Rus

Deep learning has revolutionized the ability to learn “end-to-end” autonomous vehicle control directly from raw sensory data. While there have been recent extensions to handle forms of navigation instruction, these works are unable to capture the full distribution of possible actions that could be taken and to reason about localization of the robot within the environment. In this paper, we extend end-to-end driving networks with the ability to perform point-to-point navigation as well as probabilistic localization using only noisy GPS data. We define a novel variational network capable of learning from raw camera data of the environment as well as higher level roadmaps to predict (1) a full probability distribution over the possible control commands; and (2) a deterministic control command capable of navigating on the route specified within the map. Additionally, we formulate how our model can be used to localize the robot according to correspondences between the map and the observed visual road topology, inspired by the rough localization that human drivers can perform. We test our algorithms on real-world driving data that the vehicle has never driven through before, and integrate our point-topoint navigation algorithms onboard a full-scale autonomous vehicle for real-time performance. Our localization algorithm is also evaluated over a new set of roads and intersections to demonstrates rough pose localization even in situations without any GPS prior.

ICRA Conference 2018 Conference Paper

Learning Steering Bounds for Parallel Autonomous Systems

Alexander Amini
Liam Paull
Thomas Balch
Sertac Karaman
Daniela Rus

Deep learning has been successfully applied to “end-to-end” learning of the autonomous driving task, where a deep neural network learns to predict steering control commands from camera data input. However, the learned representations do not support higher-level decision making required for autonomous navigation, nor the uncertainty estimates required for parallel autonomy, where vehicle control is shared between human and robot. This paper tackles the problem of learning a representation to predict a continuous control probability distribution, and thus steering control options and bounds for those options, which can be used for autonomous navigation. Each mode of the distribution encodes a possible macro-action that the system could execute at that instant, and the covariances of the modes place bounds on safe steering control values. Our approach has the added advantage of being trained on unlabeled data collected from inexpensive cameras. The deep neural network based algorithm generates a probability distribution over the space of steering angles, from which we leverage Variational Bayesian methods to extract a mixture model and compute the different possible actions in the environment. A bound, which the autonomous vehicle must respect in our parallel autonomy setting, is then computed for each of these actions. We evaluate our approach on a challenging dataset containing a wide variety of driving conditions, and show that our algorithm is capable of parameterizing Gaussian Mixture Models for possible actions, and extract steering bounds with a mean error of only 2 degrees. Additionally, we demonstrate our system working on a full scale autonomous vehicle and evaluate its ability to successful handle various different parallel autonomy situations.

IROS Conference 2018 Conference Paper

Variational Autoencoder for End-to-End Control of Autonomous Driving with Novelty Detection and Training De-biasing

Alexander Amini
Wilko Schwarting
Guy Rosman
Brandon Araki
Sertac Karaman
Daniela Rus

This paper introduces a new method for end-to-end training of deep neural networks (DNNs) and evaluates it in the context of autonomous driving. DNN training has been shown to result in high accuracy for perception to action learning given sufficient training data. However, the trained models may fail without warning in situations with insufficient or biased training data. In this paper, we propose and evaluate a novel architecture for self-supervised learning of latent variables to detect the insufficiently trained situations. Our method also addresses training data imbalance, by learning a set of underlying latent variables that characterize the training data and evaluate potential biases. We show how these latent distributions can be leveraged to adapt and accelerate the training pipeline by training on only a fraction of the total dataset. We evaluate our approach on a challenging dataset for driving. The data is collected from a full-scale autonomous vehicle. Our method provides qualitative explanation for the latent variables learned in the model. Finally, we show how our model can be additionally trained as an end-to-end controller, directly outputting a steering control command for an autonomous vehicle.