Author name cluster

Guy Rosman

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers

2 author rows

ICRA Conference 2025 Conference Paper

Computational Teaching for Driving via Multi-Task Imitation Learning

Deepak E. Gopinath
Xiongyi Cui
Jonathan A. DeCastro
Emily Sumner
Jean Costa
Hiroshi Yasuda
Allison Morgan
Laporsha Dees

Learning motor skills for sports or performance driving is often done with professional instruction from expert human teachers, whose availability is limited. Our goal is to enable automated teaching via a learned model that interacts with the student similar to a human teacher. However, training such automated teaching systems is limited by the availability of highquality annotated datasets of expert teacher and student interactions as they are difficult to collect at scale. To address this data scarcity problem, we propose an approach for training a coaching system for complex motor tasks such as high performance driving via a Multi-Task Imitation Learning (MTIL) paradigm. MTIL allows our model to learn robust representations by utilizing self-supervised training signals from more readily available non-interactive datasets of humans performing the task of interest. We validate our approach with (1) a semi-synthetic dataset created from real human driving trajectories, (2) a professional track driving instruction dataset, (3) a track-racing driving simulator human-subject study, and (4) a system demonstration on an instrumented car at a race track. Our experiments show that the right set of auxiliary machine learning tasks improves prediction of teaching instructions. Moreover, in the human subjects study, students exposed to the instructions from our teaching system improve their ability to stay within track limits, and show favorable perception of the model's interaction with them, in terms of usefulness and satisfaction.

NeurIPS Conference 2025 Conference Paper

Estimating cognitive biases with attention-aware inverse planning

Sounak Banerjee
Daphne Cornelisse
Deepak Gopinath
Emily Sumner
Jonathan DeCastro
Guy Rosman
Eugene Vinitsky
Mark Ho

People's goal-directed behaviors are influenced by their cognitive biases, and autonomous systems that interact with people should be aware of this. For example, people's attention to objects in their environment will be biased in a way that systematically affects how they perform everyday tasks such as driving to work. Here, building on recent work in computational cognitive science, we formally articulate the \textit{attention-aware inverse planning problem}, in which the goal is to estimate a person's attentional biases from their actions. We demonstrate how attention-aware inverse planning systematically differs from standard inverse reinforcement learning and how cognitive biases can be inferred from behavior. Finally, we present an approach to attention-aware inverse planning that combines deep reinforcement learning with computational cognitive modeling. We use this approach to infer the attentional strategies of RL agents in real-life driving scenarios selected from the Waymo Open Dataset, demonstrating the scalability of estimating cognitive biases with attention-aware inverse planning.

ICRA Conference 2025 Conference Paper

Generating Out-of-Distribution Scenarios Using Language Models

Erfan Aasi
Phat Nguyen
Shiva Sreeram
Guy Rosman
Sertac Karaman
Daniela Rus

The deployment of autonomous vehicles controlled by machine learning techniques requires extensive testing in diverse real-world environments, robust handling of edge cases and out-of-distribution scenarios, and comprehensive safety validation to ensure that these systems can navigate safely and effectively under unpredictable conditions. Addressing Out-OfDistribution (OOD) driving scenarios is essential for enhancing safety, as OOD scenarios help validate the reliability of the models within the vehicle's autonomy stack. However, generating OOD scenarios is challenging due to their long-tailed distribution and rarity in urban driving datasets. Recently, Large Language Models (LLMs) have shown promise in autonomous driving, particularly for their zero-shot generalization and common-sense reasoning capabilities. In this paper, we leverage these LLM strengths to introduce a framework for generating diverse OOD driving scenarios. Our approach uses LLMs to construct a branching tree, where each branch represents a unique OOD scenario. These scenarios are then simulated in the CARLA simulator using an automated framework that aligns scene augmentation with the corresponding textual descriptions. We evaluate our framework through extensive simulations, and assess its performance via a diversity metric that measures the richness of the scenarios. Additionally, we introduce a new “OOD-ness” metric, which quantifies how much the generated scenarios deviate from typical urban driving conditions. Furthermore, we explore the capacity of modern Vision-Language Models (VLMs) to interpret and safely navigate through the simulated OOD scenarios. Our findings offer valuable insights into the reliability of language models in addressing OOD scenarios within the context of urban driving.

ICRA Conference 2025 Conference Paper

Hypergraph-Transformer (HGT) for Interaction Event Prediction in Laparoscopic and Robotic Surgery

Lianhao Yin
Yutong Ban
Jennifer A. Eckhoff
Ozanan R. Meireles
Daniela Rus
Guy Rosman

Understanding and anticipating events and actions is critical for intraoperative assistance and decision-making during minimally invasive surgery. We propose a predictive neural network that is capable of understanding and predicting critical interaction aspects of surgical workflow based on endoscopic, intracorporeal video data, while flexibly leveraging surgical knowledge graphs. The approach incorporates a hypergraph-transformer (HGT) structure that encodes expert knowledge into the network design and predicts the hidden embedding of the graph. We verify our approach on established surgical datasets and applications, including the prediction of action-triplets, and the achievement of the Critical View of Safety (CVS), which is a critical safety measure. Moreover, we address specific, safety-related forecasts of surgical processes, such as predicting the clipping of the cystic duct or artery without prior achievement of the CVS. Our results demonstrate improvement in prediction of interactive event when incorporating with our approach compared to unstructured alternatives.

ICLR Conference 2025 Conference Paper

ReGen: Generative Robot Simulation via Inverse Design

Phat Nguyen
Tsun-Hsuan Wang
Zhang-Wei Hong
Erfan Aasi
Andrew Silva
Guy Rosman
Sertac Karaman
Daniela Rus

Simulation plays a key role in scaling robot learning and validating policies, but constructing simulations remains labor-intensive. In this paper, we introduce ReGen, a generative simulation framework that automates this process using inverse design. Given an agent's behavior (such as a motion trajectory or objective function) and its textual description, we infer the underlying scenarios and environments that could have caused the behavior. Our approach leverages large language models to construct and expand a graph that captures cause-and-effect relationships and relevant entities with properties in the environment, which is then processed to configure a robot simulation environment. Our approach supports (i) augmenting simulations based on ego-agent behaviors, (ii) controllable, counterfactual scenario generation, (iii) reasoning about agent cognition and mental states, and (iv) reasoning with distinct sensing modalities, such as braking due to faulty GPS signals. We demonstrate our method in autonomous driving and robot manipulation tasks, generating more diverse, complex simulated environments compared to existing simulations with high success rates, and enabling controllable generation for corner cases. This approach enhances the validation of robot policies and supports data or simulation augmentation, advancing scalable robot learning for improved generalization and robustness.

ICRA Conference 2025 Conference Paper

Think Deep and Fast: Learning Neural Nonlinear Opinion Dynamics from Inverse Dynamic Games for Split-Second Interactions

Haimin Hu
Jaime F. Fisac
Naomi Ehrich Leonard
Deepak E. Gopinath
Jonathan A. DeCastro
Guy Rosman

Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking “corridor” opens. While an expert human can do well at making such time-sensitive decisions, autonomous agents are incapable of rapidly reasoning about complex, potentially conflicting options, leading to suboptimal behaviors such as deadlocks. Recently, the nonlinear opinion dynamics (NOD) model has proven to exhibit fast opinion formation and avoidance of decision deadlocks. However, NOD modeling parameters are oftentimes assumed fixed, limiting their applicability in complex and dynamic environments. It remains an open challenge to determine such parameters automatically and adaptively, accounting for the ever-changing environment. In this work, we propose for the first time a learning-based and game-theoretic approach to synthesize a Neural NOD model from expert demonstrations, given as a dataset containing (possibly incomplete) state and action trajectories of interacting agents. We demonstrate Neural NOD's ability to make fast and deadlock-free decisions in a simulated autonomous racing example. We find that Neural NOD consistently outperforms the state-of-the-art data-driven inverse game baseline in terms of safety and overtaking performance.

ICRA Conference 2024 Conference Paper

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Tsun-Hsuan Wang
Alaa Maalouf
Wei Xiao 0003
Yutong Ban
Alexander Amini
Guy Rosman
Sertac Karaman
Daniela Rus

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems. We introduce a method to extract nuanced spatial features from transformers and the incorporation of latent space simulation for improved training and policy debugging. We use pixel/patch-aligned feature descriptors to expand foundational model capabilities to create an end-to-end multimodal driving model, demonstrating unparalleled results in diverse tests. Our solution combines language with visual perception and achieves significantly greater robustness on out-of-distribution situations.

IROS Conference 2024 Conference Paper

Learning autonomous driving from aerial imagery

Varun Murali
Guy Rosman
Sertac Karaman
Daniela Rus

In this work, we consider the problem of learning end to end perception to control for ground vehicles solely from aerial imagery. Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views. However, they have a large setup cost, require careful collection of data and often human effort to create usable simulators. We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle. These novel viewpoints can then be used for several downstream autonomous navigation applications. In this work, we demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data. In a traditional real to sim to real framework, the collected data would be transformed into a visual simulator which could then be used to generate novel views. In contrast, using a NeRF allows a compact representation and the ability to optimize over the parameters of the visual simulator as more data is gathered in the environment. We demonstrate the efficacy of our method in a custom built mini-city environment through the deployment of imitation policies on robotic cars. We additionally consider the task of place localization and demonstrate that our method is able to relocalize the car in the real world.

IROS Conference 2024 Conference Paper

Online Adaptation of Learned Vehicle Dynamics Model with Meta-Learning Approach

Yuki Tsuchiya
Thomas Balch
Paul Drews
Guy Rosman

We represent a vehicle dynamics model for autonomous driving near the limits of handling via a multilayer neural network. Online adaptation is desirable in order to address unseen environments. However, the model needs to adapt to new environments without forgetting previously encountered ones. In this study, we apply Continual-MAML to overcome this difficulty. It enables the model to adapt to the previously encountered environments quickly and efficiently by starting updates from optimized initial parameters. We evaluate the impact of online model adaptation with respect to inference performance and impact on control performance of a model predictive path integral (MPPI) controller using the TRIKart platform. The neural network was pre-trained using driving data collected in our test environment, and experiments for online adaptation were executed on multiple different road conditions not contained in the training data. Empirical results show that the model using Continual-MAML outperforms the fixed model and the model using gradient descent in test set loss and online tracking performance of MPPI.

ICRA Conference 2023 Conference Paper

MPOGames: Efficient Multimodal Partially Observable Dynamic Games

Oswin So
Paul Drews
Thomas Balch
Velin D. Dimitrov
Guy Rosman
Evangelos A. Theodorou

Game theoretic methods have become popular for planning and prediction in situations involving rich multi-agent interactions. However, these methods often assume the existence of a single local Nash equilibria and are hence unable to handle uncertainty in the intentions of different agents. While maximum entropy (MaxEnt) dynamic games try to address this issue, practical approaches solve for MaxEnt Nash equilibria using linear-quadratic approximations which are restricted to unimodal responses and unsuitable for scenarios with multiple local Nash equilibria. By reformulating the problem as a POMDP, we propose MPOGames, a method for efficiently solving MaxEnt dynamic games that captures the interactions between local Nash equilibria. We show the importance of uncertainty-aware game theoretic methods via a two-agent merge case study. Finally, we prove the real-time capabilities of our approach with hardware experiments on a 1/10th scale car platform.

ICRA Conference 2022 Conference Paper

A Deep Concept Graph Network for Interaction-Aware Trajectory Prediction

Yutong Ban
Xiao Li 0025
Guy Rosman
Igor Gilitschenski
Ozanan R. Meireles
Sertac Karaman
Daniela Rus

Temporal patterns (how vehicles behave in our observed past) underline our reasoning of how people drive on the road, and can explain why we make certain predictions about interactions among road agents. In this paper we propose the ConceptNet trajectory predictor - a novel prediction framework that is able to incorporate agent interactions as explicit edges in a temporal knowledge graph. We demonstrate the sample efficiency and the overall accuracy of the proposed approach, and show that using the graphical structure to explicitly model interactions enables better detection of agent interactions and improved trajectory predictions on a large real-world driving dataset.

ICRA Conference 2022 Conference Paper

HYPER: Learned Hybrid Trajectory Prediction via Factored Inference and Adaptive Sampling

Xin Huang 0018
Guy Rosman
Igor Gilitschenski
Ashkan Jasour
Stephen G. McGill
John J. Leonard
Brian Williams 0001

Modeling multi-modal high-level intent is important for ensuring diversity in trajectory prediction. Existing approaches explore the discrete nature of human intent before predicting continuous trajectories, to improve accuracy and support explainability. However, these approaches often assume the intent to remain fixed over the prediction horizon, which is problematic in practice, especially over longer horizons. To overcome this limitation, we introduce HYPER, a general and expressive hybrid prediction framework that models evolving human intent. By modeling traffic agents as a hybrid discrete-continuous system, our approach is capable of predicting discrete intent changes over time. We learn the probabilistic hybrid model via a maximum likelihood estimation problem and leverage neural proposal distributions to sample adaptively from the exponentially growing discrete space. The overall approach affords a better trade-off between accuracy and coverage. We train and validate our model on the Argoverse dataset, and demonstrate its effectiveness through comprehensive ablation studies and comparisons with state-of-the-art models.

ICRA Conference 2022 Conference Paper

Leveraging Smooth Attention Prior for Multi-Agent Trajectory Prediction

Zhangjie Cao
Erdem Biyik
Guy Rosman
Dorsa Sadigh

Multi-agent interactions are important to model for forecasting other agents' behaviors and trajectories. At a certain time, to forecast a reasonable future trajectory, each agent needs to pay attention to the interactions with only a small group of most relevant agents instead of unnecessarily paying attention to all the other agents. However, existing attention modeling works ignore that human attention in driving does not change rapidly, and may introduce fluctuating attention across time steps. In this paper, we formulate an attention model for multi-agent interactions based on a total variation temporal smoothness prior and propose a trajectory prediction architecture that leverages the knowledge of these attended interactions. We demonstrate how the total variation attention prior along with the new sequence prediction loss terms leads to smoother attention and more sample-efficient learning of multi-agent trajectory prediction, and show its advantages in terms of prediction accuracy by comparing it with the state-of-the-art approaches on both synthetic and naturalistic driving data. We demonstrate the performance of our algorithm for trajectory prediction on the INTERACTION dataset on our website 1 1 https://sites.google.com/view/smoothness-attention.

IROS Conference 2022 Conference Paper

TIP: Task-Informed Motion Prediction for Intelligent Vehicles

Xin Huang 0018
Guy Rosman
Ashkan Jasour
Stephen G. McGill
John J. Leonard
Brian Williams 0001

When predicting trajectories of road agents, motion predictors often approximate the future distribution by a limited number of samples. This constraint requires the predictors to generate samples that best support the task given task specifications. However, existing predictors are often optimized and evaluated via task-agnostic measures without accounting for the use of predictions in downstream tasks, and thus could result in sub-optimal task performance. In this paper, we propose a task-informed motion prediction model that better supports the tasks through its predictions by jointly reasoning about prediction accuracy and the utility of the downstream tasks during training. The task utility function is commonly used to evaluate task performance. It does not require the full task information, but rather a specification of the utility of the task, resulting in predictors that are tailored to different downstream tasks. We demonstrate our approach on two use cases of common decision making tasks and their utility functions, in the context of autonomous driving and parallel autonomy. Experiment results show that our predictor produces accurate predictions that improve the task performance by a large margin in both tasks when compared to task-agnostic baselines on the Waymo Open Motion dataset.

ICRA Conference 2022 Conference Paper

Trajectory Prediction with Linguistic Representations

Yen-Ling Kuo
Xin Huang 0018
Andrei Barbu
Stephen G. McGill
Boris Katz
John J. Leonard
Guy Rosman

Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories, and is trained using trajectory samples with partially-annotated captions. The model learns the meaning of each of the words without direct per-word supervision. At inference time, it generates a linguistic description of trajectories which captures maneuvers and interactions over an extended time interval. This generated description is used to refine predictions of the trajectories of multiple agents. We train and validate our model on the Argoverse dataset, and demonstrate improved accuracy results in trajectory prediction. In addition, our model is more interpretable: it presents part of its reasoning in plain language as captions, which can aid model development and can aid in building confidence in the model before deploying it.

ICRA Conference 2021 Conference Paper

Aggregating Long-Term Context for Learning Laparoscopic and Robot-Assisted Surgical Workflows

Yutong Ban
Guy Rosman
Thomas M. Ward
Daniel A. Hashimoto
Taisei Kondo
Hidekazu Iwaki
Ozanan R. Meireles
Daniela Rus

Analyzing surgical workflow is crucial for surgical assistance robots to understand surgeries. With the understanding of the complete surgical workflow, the robots are able to assist the surgeons in intra-operative events, such as by giving a warning when the surgeon is entering specific keys or high-risk phases. Deep learning techniques have recently been widely applied to recognizing surgical workflows. Many of the existing temporal neural network models are limited in their capability to handle long-term dependencies in the data, instead, relying upon the strong performance of the underlying per-frame visual models. We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics that are propagated by a sufficient statistics model (SSM). We implement our approach within an LSTM backbone for the task of surgical phase recognition and explore several choices for propagated statistics. We demonstrate superior results over existing and novel state-of-the-art segmentation techniques on two laparoscopic cholecystectomy datasets: the publicly available Cholec80 dataset and MGH100, a novel dataset with more challenging and clinically meaningful segment labels.

IROS Conference 2021 Conference Paper

Risk Conditioned Neural Motion Planning

Xin Huang 0018
Meng Feng
Ashkan Jasour
Guy Rosman
Brian Williams 0001

Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks. While existing mathematical programming methods offer theoretical guarantees in the context of constrained Markov decision processes, they either lack scalability in solving larger problems or produce conservative plans. Recent advances in deep reinforcement learning improve scalability by learning policy networks as function approximators. In this paper, we propose an extension of soft actor critic model to estimate the execution risk of a plan through a risk critic and produce risk-bounded policies efficiently by adding an extra risk term in the loss function of the policy network. We define the execution risk in an accurate form, as opposed to approximating it through a summation of immediate risks at each time step that leads to conservative plans. Our proposed model is conditioned on a continuous spectrum of risk bounds, allowing the user to adjust the risk-averse level of the agent on the fly. Through a set of experiments, we show the advantage of our model in terms of both computational time and plan quality, compared to a state-of-the-art mathematical programming baseline, and validate its performance in more complicated scenarios, including nonlinear dynamics and larger state space.

IROS Conference 2020 Conference Paper

Behaviorally Diverse Traffic Simulation via Reinforcement Learning

Shinya Shiroshita
Shirou Maruyama
Daisuke Nishiyama
Mario Ynocente Castro
Karim Hamzaoui
Guy Rosman
Jonathan A. DeCastro
Kuan-Hui Lee

Traffic simulators are important tools in autonomous driving development. While continuous progress has been made to provide developers more options for modeling various traffic participants, tuning these models to increase their behavioral diversity while maintaining quality is often very challenging. This paper introduces an easily-tunable policy generation algorithm for autonomous driving agents. The proposed algorithm balances diversity and driving skills by leveraging the representation and exploration abilities of deep reinforcement learning via a distinct policy set selector. Moreover, we present an algorithm utilizing intrinsic rewards to widen behavioral differences in the training. To provide quantitative assessments, we develop two trajectory-based evaluation metrics which measure the differences among policies and behavioral coverage. We experimentally show the effectiveness of our methods on several challenging intersection scenes.

IROS Conference 2020 Conference Paper

Driving Through Ghosts: Behavioral Cloning with False Positives

Andreas Bühler
Adrien Gaidon
Andrei Cramariuc
Rares Ambrus
Guy Rosman
Wolfram Burgard

Safe autonomous driving requires robust detection of other traffic participants. However, robust does not mean perfect, and safe systems typically minimize missed detections at the expense of a higher false positive rate. This results in conservative and yet potentially dangerous behavior such as avoiding imaginary obstacles. In the context of behavioral cloning, perceptual errors at training time can lead to learning difficulties or wrong policies, as expert demonstrations might be inconsistent with the perceived world state. In this work, we propose a behavioral cloning approach that can safely leverage imperfect perception without being conservative. Our core contribution is a novel representation of perceptual uncertainty for learning to plan. We propose a new probabilistic birds-eye-view semantic grid to encode the noisy output of object perception systems. We then leverage expert demonstrations to learn an imitative driving policy using this probabilistic representation. Using the CARLA simulator, we show that our approach can safely overcome critical false positives that would otherwise lead to catastrophic failures or conservative behavior.

IROS Conference 2019 Conference Paper

Infrastructure-free NLoS Obstacle Detection for Autonomous Cars

Felix Naser
Igor Gilitschenski
Alexander Amini
Christina Liao
Guy Rosman
Sertac Karaman
Daniela Rus

Current perception systems mostly require direct line of sight to anticipate and ultimately prevent potential collisions at intersections with other road users. We present a fully integrated autonomous system capable of detecting shadows or weak illumination changes on the ground caused by a dynamic obstacle in NLoS scenarios. This additional virtual sensor “ShadowCam” extends the signal range utilized so far by computer-vision ADASs. We show that (1) our algorithm maintains the mean classification accuracy of around 70% even when it doesn’t rely on infrastructure – such as AprilTags - as an image registration method. We validate (2) in real-world experiments that our autonomous car driving in night time conditions detects a hidden approaching car earlier with our virtual sensor than with the front facing 2-D LiDAR.

ICRA Conference 2019 Conference Paper

Uncertainty-Aware Driver Trajectory Prediction at Urban Intersections

Xin Huang 0018
Stephen G. McGill
Brian Williams 0001
Luke Fletcher
Guy Rosman

Predicting the motion of a driver’s vehicle is crucial for advanced driving systems, enabling detection of potential risks towards shared control between the driver and automation systems. In this paper, we propose a variational neural network approach that predicts future driver trajectory distributions for the vehicle based on multiple sensors. Our predictor generates both a conditional variational distribution of future trajectories, as well as a confidence estimate for different time horizons. Our approach allows us to handle inherently uncertain situations, and reason about information gain from each input, as well as combine our model with additional predictors, creating a mixture of experts. We show how to augment the variational predictor with a physics-based predictor, and based on their confidence estimations, improve overall system performance. The resulting combined model is aware of the uncertainty associated with its predictions, which can help the vehicle autonomy to make decisions with more confidence. The model is validated on real-world urban driving data collected in multiple locations. This validation demonstrates that our approach improves the prediction error of a physics-based model by 25% while successfully identifying the uncertain cases with 82% accuracy.

ICRA Conference 2019 Conference Paper

Variational End-to-End Navigation and Localization

Alexander Amini
Guy Rosman
Sertac Karaman
Daniela Rus

Deep learning has revolutionized the ability to learn “end-to-end” autonomous vehicle control directly from raw sensory data. While there have been recent extensions to handle forms of navigation instruction, these works are unable to capture the full distribution of possible actions that could be taken and to reason about localization of the robot within the environment. In this paper, we extend end-to-end driving networks with the ability to perform point-to-point navigation as well as probabilistic localization using only noisy GPS data. We define a novel variational network capable of learning from raw camera data of the environment as well as higher level roadmaps to predict (1) a full probability distribution over the possible control commands; and (2) a deterministic control command capable of navigating on the route specified within the map. Additionally, we formulate how our model can be used to localize the robot according to correspondences between the map and the observed visual road topology, inspired by the rough localization that human drivers can perform. We test our algorithms on real-world driving data that the vehicle has never driven through before, and integrate our point-topoint navigation algorithms onboard a full-scale autonomous vehicle for real-time performance. Our localization algorithm is also evaluated over a new set of roads and intersections to demonstrates rough pose localization even in situations without any GPS prior.

ICRA Conference 2018 Conference Paper

Task-Specific Sensor Planning for Robotic Assembly Tasks

Guy Rosman
Changhyun Choi
Mehmet Remzi Dogar
John W. Fisher III
Daniela Rus

When performing multi-robot tasks, sensory feedback is crucial in reducing uncertainty for correct execution. Yet the utilization of sensors should be planned as an integral part of the task planning, taken into account several factors such as the tolerance of different inferred properties of the scene and interaction with different agents. In this paper we handle this complex problem in a principled, yet efficient way. We use surrogate predictors based on open-loop simulation to estimate and bound the probability of success for specific tasks. We reason about such task-specific uncertainty approximants and their effectiveness. We show how they can be incorporated into a multi-robot planner, and demonstrate results with a team of robots performing assembly tasks.

IROS Conference 2018 Conference Paper

Variational Autoencoder for End-to-End Control of Autonomous Driving with Novelty Detection and Training De-biasing

Alexander Amini
Wilko Schwarting
Guy Rosman
Brandon Araki
Sertac Karaman
Daniela Rus

This paper introduces a new method for end-to-end training of deep neural networks (DNNs) and evaluates it in the context of autonomous driving. DNN training has been shown to result in high accuracy for perception to action learning given sufficient training data. However, the trained models may fail without warning in situations with insufficient or biased training data. In this paper, we propose and evaluate a novel architecture for self-supervised learning of latent variables to detect the insufficiently trained situations. Our method also addresses training data imbalance, by learning a set of underlying latent variables that characterize the training data and evaluate potential biases. We show how these latent distributions can be leveraged to adapt and accelerate the training pipeline by training on only a fraction of the total dataset. We evaluate our approach on a challenging dataset for driving. The data is collected from a full-scale autonomous vehicle. Our method provides qualitative explanation for the latent variables learned in the model. Finally, we show how our model can be additionally trained as an end-to-end controller, directly outputting a steering control command for an autonomous vehicle.

ICRA Conference 2017 Conference Paper

Duckietown: An open, inexpensive and flexible platform for autonomy education and research

Liam Paull
Jacopo Tani
Heejin Ahn
Javier Alonso-Mora
Luca Carlone
Michal Cáp
Yu Fan Chen
Changhyun Choi

Duckietown is an open, inexpensive and flexible platform for autonomy education and research. The platform comprises small autonomous vehicles (“Duckiebots”) built from off-the-shelf components, and cities (“Duckietowns”) complete with roads, signage, traffic lights, obstacles, and citizens (duckies) in need of transportation. The Duckietown platform offers a wide range of functionalities at a low cost. Duckiebots sense the world with only one monocular camera and perform all processing onboard with a Raspberry Pi 2, yet are able to: follow lanes while avoiding obstacles, pedestrians (duckies) and other Duckiebots, localize within a global map, navigate a city, and coordinate with other Duckiebots to avoid collisions. Duckietown is a useful tool since educators and researchers can save money and time by not having to develop all of the necessary supporting infrastructure and capabilities. All materials are available as open source, and the hope is that others in the community will adopt the platform for education and research.

IROS Conference 2017 Conference Paper

Hybrid control and learning with coresets for autonomous vehicles

Guy Rosman
Liam Paull
Daniela Rus

Modern autonomous systems such as driverless vehicles need to safely operate in a wide range of conditions. A potential solution is to employ a hybrid systems approach, where safety is guaranteed in each individual mode within the system. This offsets complexity and responsibility from the individual controllers onto the complexity of determining discrete mode transitions. In this work we propose an efficient framework based on recursive neural networks and coreset data summarization to learn the transitions between an arbitrary number of controller modes that can have arbitrary complexity. Our approach allows us to efficiently gather annotation data from the large-scale datasets that are required to train such hybrid nonlinear systems to be safe under all operating conditions, favoring underexplored parts of the data. We demonstrate the construction of the embedding, and efficient detection of switching points for autonomous and non-autonomous car data. We further show how our approach enables efficient sampling of training data, to further improve either our embedding or the controllers.

ICRA Conference 2017 Conference Paper

Machine learning and coresets for automated real-time video segmentation of laparoscopic and robot-assisted surgery

Mikhail Volkov 0002
Daniel A. Hashimoto
Guy Rosman
Ozanan R. Meireles
Daniela Rus

Context-aware segmentation of laparoscopic and robot assisted surgical video has been shown to improve performance and perioperative workflow efficiency, and can be used for education and time-critical consultation. Modern pressures on productivity preclude manual video analysis, and hospital policies and legacy infrastructure are often prohibitive of recording and storing large amounts of data. In this paper we present a system that automatically generates a video segmentation of laparoscopic and robot-assisted procedures according to their underlying surgical phases using minimal computational resources, and low amounts of training data. Our system uses an SVM and HMM in combination with an augmented feature space that captures the variability of these video streams without requiring analysis of the nonrigid and variable environment. By using the data reduction capabilities of online k-segment coreset algorithms we can efficiently produce results of approximately equal quality, in realtime. We evaluate our system in cross-validation experiments and propose a blueprint for piloting such a system in a real operating room environment with minimal risk factors.

ICRA Conference 2017 Conference Paper

Persistent surveillance of events with unknown, time-varying statistics

Cenk Baykal
Guy Rosman
Sebastian Claici
Daniela Rus

We consider the problem of monitoring stochastic, time-varying events occurring at discrete locations. Our problem formulation extends prior work in persistent surveillance by considering the objective of maximizing event detections in unknown, dynamic environments where the rates of events are time-inhomogeneous and may be subject to abrupt changes. We propose a novel monitoring algorithm that effectively strikes a balance between exploration and exploitation as well as a balance between remembering and discarding information to handle temporal variations in unknown environments. We present an analysis proving the long-run average optimality of the policies generated by our algorithm under the assumption that the total temporal variations are sub-linear. We present simulation results demonstrating the effectiveness of our algorithm in several monitoring scenarios inspired by real-world applications, and its robustness to both continuous-random and abrupt changes in the statistics of the observed processes.

ICRA Conference 2015 Conference Paper

Coresets for visual summarization with applications to loop closure

Mikhail Volkov 0002
Guy Rosman
Dan Feldman
John W. Fisher III
Daniela Rus

In continuously operating robotic systems, efficient representation of the previously seen camera feed is crucial. Using a highly efficient compression coreset method, we formulate a new method for hierarchical retrieval of frames from large video streams collected online by a moving robot. We demonstrate how to utilize the resulting structure for efficient loop-closure by a novel sampling approach that is adaptive to the structure of the video. The same structure also allows us to create a highly-effective search tool for large-scale videos, which we demonstrate in this paper. We show the efficiency of proposed approaches for retrieval and loop closure on standard datasets, and on a large-scale video from a mobile camera.

NeurIPS Conference 2014 Conference Paper

Coresets for k-Segmentation of Streaming Data

Guy Rosman
Mikhail Volkov
Dan Feldman
John Fisher III
Daniela Rus

Life-logging video streams, financial time series, and Twitter tweets are a few examples of high-dimensional signals over practically unbounded time. We consider the problem of computing optimal segmentation of such signals by k-piecewise linear function, using only one pass over the data by maintaining a coreset for the signal. The coreset enables fast further analysis such as automatic summarization and analysis of such signals. A coreset (core-set) is a compact representation of the data seen so far, which approximates the data well for a specific task -- in our case, segmentation of the stream. We show that, perhaps surprisingly, the segmentation problem admits coresets of cardinality only linear in the number of segments k, independently of both the dimension d of the signal, and its number n of points. More precisely, we construct a representation of size O(klog n /eps^2) that provides a (1+eps)-approximation for the sum of squared distances to any given k-piecewise linear function. Moreover, such coresets can be constructed in a parallel streaming approach. Our results rely on a novel eduction of statistical estimations to problems in computational geometry. We empirically evaluate our algorithms on very large synthetic and real data sets from GPS, video and financial domains, using 255 machines in Amazon cloud.