Author name cluster

Ming Lin 0003

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

50 papers

1 author row

ICML Conference 2025 Conference Paper

Adaptive Sensitivity Analysis for Robust Augmentation against Natural Corruptions in Image Segmentation

Laura Y. Zheng
Wenjie Wei
Tony Wu
Jacob Clements
Shreelekha Revankar
Andre Harrison
Yu Shen
Ming Lin 0003

Achieving robustness in image segmentation models is challenging due to the fine-grained nature of pixel-level classification. These models, which are crucial for many real-time perception applications, particularly struggle when faced with natural corruptions in the wild for autonomous systems. While sensitivity analysis can help us understand how input variables influence model outputs, its application to natural and uncontrollable corruptions in training data is computationally expensive. In this work, we present an adaptive, sensitivity-guided augmentation method to enhance robustness against natural corruptions. Our sensitivity analysis on average runs 10 times faster and requires about 200 times less storage than previous sensitivity analysis, enabling practical, on-the-fly estimation during training for a model-free augmentation policy. With minimal fine-tuning, our sensitivity-guided augmentation method achieves improved robustness on both real-world and synthetic datasets compared to state-of-the-art data augmentation techniques in image segmentation.

ICRA Conference 2025 Conference Paper

DISC: Dataset for Analyzing Driving Styles in Simulated Crashes for Mixed Autonomy

Sandip Sharan Senthil Kumar
Sandeep Thalapanane
Guru Nandhan Appiya Dilipkumar Peethambari
Sourang SriHari
Laura Zheng
Ming Lin 0003

Handling pre-crash scenarios is still a major challenge for self-driving cars due to limited practical data and human-driving behavior datasets. We introduce DISC (Driving Styles In Simulated Crashes), one of the first datasets designed to capture various driving styles and behaviors in precrash scenarios for mixed autonomy analysis. DISC includes over 8 classes of driving styles/behaviors from hundreds of drivers navigating a simulated vehicle through a virtual city, encountering rare-event traffic scenarios. This dataset enables the classification of pre-crash human driving behaviors in unsafe conditions, supporting individualized trajectory prediction based on observed driving patterns. By utilizing a custom-designed VR-based in-house driving simulator, TRAVERSE, data was collected through a driver-centric study involving human drivers encountering twelve simulated accident scenarios. This dataset fills a critical gap in human-centric driving data for rare events involving interactions with autonomous vehicles. It enables autonomous systems to better react to human drivers and optimize trajectory prediction in mixed autonomy environments involving both human-driven and self-driving cars. In addition, individual driving behaviors are classified through a set of standardized questionnaires, carefully designed to identify and categorize driving behavior traits. We correlate data features with driving behaviors, showing that the simulated environment reflects real-world driving styles. DISC is the first dataset to capture how various driving styles respond to accident scenarios, offering significant potential to enhance autonomous vehicle safety and driving behavior analysis in mixed autonomy environments.

ICRA Conference 2025 Conference Paper

Gradient-Based Trajectory Optimization with Parallelized Differentiable Traffic Simulation

Sanghyun Son 0003
Laura Zheng
Brian Clipp
Connor Greenwell
Sujin Philip
Ming Lin 0003

We present a parallelized differentiable traffic simulator based on the Intelligent Driver Model (IDM), a car-following framework that incorporates driver behavior as key variables. Our vehicle simulator efficiently models vehicle motion, generating trajectories that can be supervised to fit real-world data. By leveraging its differentiable nature, IDM parameters are optimized using gradient-based methods. With the capability to simulate up to 2 million vehicles in real time, the system is scalable for large-scale trajectory optimization. We show that we can use the simulator to filter noise in the input trajectories (trajectory filtering), reconstruct dense trajectories from sparse ones (trajectory reconstruction), and predict future trajectories (trajectory prediction), with all generated trajectories adhering to physical laws. We validate our simulator and algorithm on several datasets including NGSIM and Waymo Open Dataset. The code is publicly available at: https://github.com/SonSang/diffidm.

IROS Conference 2025 Conference Paper

MMCD: Multi-Modal Collaborative Decision-Making for Connected Autonomy with Knowledge Distillation

Rui Liu 0040
Zikang Wang
Peng Gao 0007
Yu Shen
Pratap Tokekar
Ming Lin 0003

Autonomous systems have advanced significantly, but challenges persist in accident-prone environments where robust decision-making is crucial. A single vehicle’s limited sensor range and obstructed views increase the likelihood of accidents. Multi-vehicle connected systems and multi-modal approaches, leveraging RGB images and LiDAR point clouds, have emerged as promising solutions. However, existing methods often assume the availability of all data modalities and connected vehicles during both training and testing, which is impractical due to potential sensor failures or missing connected vehicles. To address these challenges, we introduce a novel framework MMCD (Multi-Modal Collaborative Decision-making) for connected autonomy. Our framework fuses multi-modal observations from ego and collaborative vehicles to enhance decision-making under challenging conditions. To ensure robust performance when certain data modalities are unavailable during testing, we propose an approach based on cross-modal knowledge distillation with a teacher-student model structure. The teacher model is trained with multiple data modalities, while the student model is designed to operate effectively with reduced modalities. In experiments on connected autonomous driving with ground vehicles and aerial-ground vehicles collaboration, our method improves driving safety by up to 20. 7%, surpassing the best-existing baseline in detecting potential accidents and making safe driving decisions. More information can be found on our website https://ruiiu.github.io/mmcd.

IROS Conference 2025 Conference Paper

Quantifying and Modeling Driving Styles in Trajectory Forecasting

Laura Zheng
Hamidreza Yaghoubi Araghi
Tony Wu
Sandeep Thalapanane
Tianyi Zhou 0001
Ming Lin 0003

Trajectory forecasting has become a popular deep learning task due to its relevance for scenario simulation for autonomous driving. Specifically, trajectory forecasting predicts the trajectory of a short-horizon future for specific human drivers in a particular traffic scenario. Robust and accurate future predictions can enable autonomous driving planners to optimize for low-risk and predictable outcomes for human drivers around them. Although some work has been done to model driving style in planning and personalized autonomous polices, a gap exists in explicitly modeling human driving styles for trajectory forecasting of human behavior. Human driving style is most certainly a correlating factor to decision making, especially in edge-case scenarios where risk is nontrivial, as justified by the large amount of traffic psychology literature on risky driving. So far, the current real-world datasets for trajectory forecasting lack insight on the variety of represented driving styles. While the datasets may represent real-world distributions of driving styles, we posit that fringe driving style types may also be correlated with edge-case safety scenarios. In this work, we conduct analyses on existing real-world trajectory datasets for driving and dissect these works from the lens of driving styles, which is often intangible and non-standardized.

ICML Conference 2025 Conference Paper

Time-Aware World Model for Adaptive Prediction and Control

Anh N. Nhu
Sanghyun Son 0003
Ming Lin 0003

In this work, we introduce the Time-Aware World Model (TAWM), a model-based approach that explicitly incorporates temporal dynamics. By conditioning on the time-step size, $\Delta t$, and training over a diverse range of $\Delta t$ values – rather than sampling at a fixed time-step – TAWM learns both high- and low-frequency task dynamics across diverse control problems. Grounded in the information-theoretic insight that the optimal sampling rate depends on a system’s underlying dynamics, this time-aware formulation improves both performance and data efficiency. Empirical evaluations show that TAWM consistently outperforms conventional models across varying observation rates in a variety of control tasks, using the same number of training samples and iterations. Our code can be found online at: github. com/anh-nn01/Time-Aware-World-Model.

ICML Conference 2024 Conference Paper

An Intrinsic Vector Heat Network

Alexander Gao
Maurice Chu
Mubbasir Kapadia
Ming Lin 0003
Hsueh-Ti Derek Liu

Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-valued architectures to process channels individually, thus fail to preserve fundamental intrinsic properties of the vector field. The core idea of this work is to introduce a trainable vector heat diffusion module to spatially propagate vector-valued feature data across the surface, which we incorporate into our proposed architecture that consists of vector-valued neurons. Our architecture is invariant to rigid motion of the input, isometric deformation, and choice of local tangent bases, and is robust to discretizations of the surface. We evaluate our Vector Heat Network on triangle meshes, and empirically validate its invariant properties. We also demonstrate the effectiveness of our method on the useful industrial application of quadrilateral mesh generation.

ICRA Conference 2024 Conference Paper

Collaborative Decision-Making Using Spatiotemporal Graphs in Connected Autonomy

Peng Gao 0007
Yu Shen
Ming Lin 0003

Collaborative decision-making is an essential capability for multi-robot systems, such as connected vehicles, to collaboratively control autonomous vehicles in accident-prone scenarios. Under limited communication bandwidth, capturing comprehensive situational awareness by integrating connected agents’ observation is very challenging. In this paper, we propose a novel collaborative decision-making method that efficiently and effectively integrates collaborators’ representations to control the ego vehicle in accident-prone scenarios. Our approach formulates collaborative decision-making as a classification problem. We first represent sequences of raw observations as spatiotemporal graphs, which significantly reduce the package size to share among connected vehicles. Then we design a novel spatiotemporal graph neural network based on heterogeneous graph learning, which analyzes spatial and temporal connections of objects in a unified way for collaborative decision-making. We evaluate our approach using a high-fidelity simulator that considers realistic traffic, communication bandwidth, and vehicle sensing among connected autonomous vehicles. The experimental results show that our representation achieves over 100x reduction in the shared data size that meets the requirements of communication bandwidth for connected autonomous driving. In addition, our approach achieves over 30% improvements in driving safety.

IROS Conference 2024 Conference Paper

Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic

Laura Zheng
Sanghyun Son 0003
Jing Liang 0006
Xijun Wang 0002
Brian Clipp
Ming Lin 0003

In trajectory forecasting tasks for traffic, future output trajectories can be computed by advancing the ego vehicle’s state with predicted actions according to a kinematics model. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories should not only be kinematically feasible but also relate uncertainty from one timestep to the next. While current works in probabilistic prediction do incorporate kinematic priors for mean trajectory prediction, variance is often left as a learnable parameter, despite uncertainty in one time step being inextricably tied to uncertainty in the previous time step. In this paper, we show simple and differentiable analytical approximations describing the relationship between variance at one timestep and that at the next with the kinematic bicycle model. In our results, we find that encoding the relationship between variance across timesteps works especially well in unoptimal settings, such as with small or noisy datasets. We observe up to a 50% performance boost in partial dataset settings and up to an 8% performance boost in large-scale learning compared to previous kinematic prediction methods on SOTA trajectory forecasting architectures out-of-the-box, with no fine-tuning.

ICRA Conference 2024 Conference Paper

HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors

Shutong Zhang
Yi-Ling Qiao
Guanglei Zhu
Eric Heiden
Dylan Turpin
Jingzhou Liu
Ming Lin 0003
Miles Macklin

Various heuristic objectives for modeling hand-object interaction have been proposed in past work. However, due to the lack of a cohesive framework, these objectives often possess a narrow scope of applicability and are limited by their efficiency or accuracy. In this paper, we propose HANDYPRIORS, a unified and general pipeline for pose estimation in human-object interaction scenes by leveraging recent advances in differentiable physics and rendering. Our approach employs rendering priors to align with input images and segmentation masks along with physics priors to mitigate penetration and relative-sliding across frames. Furthermore, we present two alternatives for hand and object pose estimation. The optimization-based pose estimation achieves higher accuracy, while the filtering-based tracking, which utilizes the differentiable priors as dynamics and observation models, executes faster. We demonstrate that HANDYPRIORS attains comparable or superior results in the pose estimation task, and that the differentiable physics module can predict contact information for pose refinement. We also show that our approach generalizes to perception tasks, including robotic hand manipulation and human-object pose estimation in the wild.

ICRA Conference 2024 Conference Paper

MTG: Mapless Trajectory Generator with Traversability Coverage for Outdoor Navigation

Jing Liang 0006
Peng Gao 0007
Xuesu Xiao
Adarsh Jagan Sathyamoorthy
Mohamed Elnoor
Ming Lin 0003
Dinesh Manocha

We present a novel learning-based trajectory generation algorithm for outdoor robot navigation. Our goal is to compute collision-free paths that also satisfy the environment-specific traversability constraints. Our approach is designed for global planning using limited onboard robot perception in mapless environments while ensuring comprehensive coverage of all traversable directions. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model that is enhanced with traversability constraints and an optimization formulation used for the coverage. We highlight the benefits of our approach over state-of-the-art trajectory generation approaches and demonstrate its performance in challenging and large outdoor environments, including around buildings, across intersections, along trails, and off-road terrain, using a Clearpath Husky and a Boston Dynamics Spot robot. In practice, our approach results in a 6% improvement in coverage of traversable areas and an 89% reduction in trajectory portions residing in non-traversable regions. Our video is here: https://youtu.be/3eJ2soAzXnU

ICRA Conference 2024 Conference Paper

Task-Driven Domain-Agnostic Learning with Information Bottleneck for Autonomous Steering

Yu Shen
Laura Zheng
Tianyi Zhou 0001
Ming Lin 0003

Environments for autonomous driving can vary from place to place, leading to challenges in designing a learning model for a new scene. Transfer learning can leverage knowledge from a learned domain to a new domain with limited data. In this work, we focus on end-to-end autonomous driving as the target task, consisting of both perception and control. We first utilize information bottleneck analysis to build a causal graph that defines our framework and the loss function; then we propose a novel domain-agnostic learning method for autonomous steering based on our analysis of training data, network architecture, and training paradigm. Experiments show that our method outperforms other SOTA methods.

IROS Conference 2024 Conference Paper

TRAVERSE: Traffic-Responsive Autonomous Vehicle Experience & Rare-event Simulation for Enhanced safety

Sandeep Thalapanane
Sandip Sharan Senthil Kumar
Guru Nandhan Appiya Dilipkumar Peethambari
Sourang SriHari
Laura Zheng
Julio Poveda
Ming Lin 0003

Data for training learning-enabled self-driving cars in the physical world are typically collected in a safe, normal environment. Such data distribution often engenders a strong bias towards safe driving, making self-driving cars unprepared when encountering adversarial scenarios like unexpected accidents. Due to a dearth of such adverse data that is unrealistic for drivers to collect, autonomous vehicles can perform poorly when experiencing such rare events. This work addresses much-needed research by having participants drive a VR vehicle simulator going through simulated traffic with various types of accidental scenarios. It aims to understand human responses and behaviors in simulated accidents, contributing to our understanding of driving dynamics and safety. The simulation framework adopts a robust traffic simulation and is rendered using the Unity Game Engine. Furthermore, the simulation framework is built with portable, light-weight immersive driving simulator hardware, lowering the resource barrier for studies in autonomous driving research.

ICRA Conference 2023 Conference Paper

A Framework for Active Haptic Guidance Using Robotic Haptic Proxies

Niall L. Williams
Nicholas Rewkowski
Jiasheng Li
Ming Lin 0003

Haptic feedback is an important component of creating an immersive mixed reality experience. Traditionally, haptic forces are rendered in response to the user's interactions with the virtual environment. In this work, we explore the idea of rendering haptic forces in a proactive manner, with the explicit intention to influence the user's behavior through compelling haptic forces. To this end, we present a framework for active haptic guidance in mixed reality, using one or more robotic haptic proxies to influence user behavior and deliver a safer and more immersive virtual experience. We provide details on common challenges that need to be overcome when implementing active haptic guidance, and discuss example applications that show how active haptic guidance can be used to influence the user's behavior. Finally, we apply active haptic guidance to a virtual reality navigation problem, and conduct a user study that demonstrates how active haptic guidance creates a safer and more immersive experience for users.

ICML Conference 2023 Conference Paper

Auxiliary Modality Learning with Generalized Curriculum Distillation

Yu Shen
Xijun Wang 0002
Peng Gao 0007
Ming Lin 0003

Driven by the need from real-world applications, Auxiliary Modality Learning (AML) offers the possibility to utilize more information from auxiliary data in training, while only requiring data from one or fewer modalities in test, to save the overall computational cost and reduce the amount of input data for inferencing. In this work, we formally define “Auxiliary Modality Learning” (AML), systematically classify types of auxiliary modality (in visual computing) and architectures for AML, and analyze their performance. We also analyze the conditions under which AML works well from the optimization and data distribution perspectives. To guide various choices to achieve optimal performance using AML, we propose a novel method to assist in choosing the best auxiliary modality and estimating an upper bound performance before executing AML. In addition, we propose a new AML method using generalized curriculum distillation to enable more effective curriculum learning. Our method achieves the best performance compared to other SOTA methods.

ICRA Conference 2023 Conference Paper

DifFAR: Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

Divya Kothandaraman
Ming Lin 0003
Dinesh Manocha

We present a learning algorithm, DifFAR, for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras that contain a human actor along with background motion. Typically, the human actors occupy less than one-tenth of the spatial resolution. DifFAR simultaneously harnesses the benefits of frequency domain representations, a classical analysis tool in signal processing, and data driven neural networks. We build a differentiable static-dynamic frequency mask prior to model the salient static and dynamic pixels in the video, crucial for the underlying task of action recognition. We use this differentiable mask prior to enable the neural network to intrinsically learn disentangled feature representations via an identity loss function. Our formulation empowers the network to inherently compute disentangled salient features within its layers. Further, we propose a cost-function encapsulating temporal relevance and spatial content to sample the most important frame within uniformly spaced video segments. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset and demonstrate relative improvements of 5. 72% - 13. 00% over the state-of-the-art and 14. 28% - 38. 05% over the corresponding baseline model.

ICLR Conference 2023 Conference Paper

PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification

Xuan Li 0015
Yi-Ling Qiao
Peter Yichen Chen
Krishna Murthy Jatavallabhula
Ming Lin 0003
Chenfanfu Jiang
Chuang Gan 0001

Existing approaches to system identification (estimating the physical parameters of an object) from videos assume known object geometries. This precludes their applicability in a vast majority of scenes where object geometries are complex or unknown. In this work, we aim to identify parameters characterizing a physical system from a set of multi-view videos without any assumption on object geometry or topology. To this end, we propose "Physics Augmented Continuum Neural Radiance Fields" (PAC-NeRF), to estimate both the unknown geometry and physical parameters of highly dynamic objects from multi-view videos. We design PAC-NeRF to only ever produce physically plausible states by enforcing the neural radiance field to follow the conservation laws of continuum mechanics. For this, we design a hybrid Eulerian-Lagrangian representation of the neural radiance field, i.e., we use the Eulerian grid representation for NeRF density and color fields, while advecting the neural radiance fields via Lagrangian particles. This hybrid Eulerian-Lagrangian representation seamlessly blends efficient neural rendering with the material point method (MPM) for robust differentiable physics simulation. We validate the effectiveness of our proposed framework on geometry and physical parameter estimation over a vast range of materials, including elastic bodies, plasticine, sand, Newtonian and non-Newtonian fluids, and demonstrate significant performance gain on most tasks.

ICRA Conference 2023 Conference Paper

Small-shot Multi-modal Distillation for Vision-based Autonomous Steering

Yu Shen
Luyu Yang
Xijun Wang 0002
Ming Lin 0003

In this paper, we propose a novel learning framework for autonomous systems that uses a small amount of “auxiliary information” that complements the learning of the main modality, called “small-shot auxiliary modality distillation network (AMD-S-Net)”. The AMD-S-Net contains a two-stream framework design that can fully extract information from different types of data (i. e. , paired/unpaired multi-modality data) to distill knowledge more effectively. We also propose a novel training paradigm based on the “reset operation” that enables the teacher to explore the local loss landscape near the student domain iteratively, providing local landscape information and potential directions to discover better solutions by the student, thus achieving higher learning performance. Our experiments show that AMD-S-Net and our training paradigm outperform other SOTA methods by up to 12. 7% and 18. 1% improvement in autonomous steering, respectively.

ICRA Conference 2023 Conference Paper

Traffic-Aware Autonomous Driving with Differentiable Traffic Simulation

Laura Zheng
Sanghyun Son 0003
Ming Lin 0003

While there have been advancements in autonomous driving control and traffic simulation, there have been little to no works exploring their unification with deep learning. Works in both areas seem to focus on entirely different exclusive problems, yet traffic and driving are inherently related in the real world. In this paper, we present Traffic-Aware Autonomous Driving (TrAAD), a generalizable distillation-style method for traffic-informed imitation learning that directly optimizes for faster traffic flow and lower energy consumption. TrAAD focuses on the supervision of speed control in imitation learning systems, as most driving research focuses on perception and steering. Moreover, our method addresses the lack of co-simulation between traffic and driving simulators and provides a basis for directly involving traffic simulation with autonomous driving in future work. Our results show that, with information from traffic simulation involved in the supervision of imitation learning methods, an autonomous vehicle can learn how to accelerate in a fashion that is beneficial for traffic flow and overall energy consumption for all nearby vehicles.

IROS Conference 2023 Conference Paper

Visual, Spatial, Geometric-Preserved Place Recognition for Cross-View and Cross-Modal Collaborative Perception

Peng Gao 0007
Jing Liang 0006
Yu Shen
Sanghyun Son 0003
Ming Lin 0003

Place recognition plays an important role in multi-robot collaborative perception, such as aerial-ground search and rescue, in order to identify the same place they have visited. Recently, approaches based on semantics showed the promising performance to address cross-view and cross-modal challenges in place recognition, which can be further categorized as graph-based and geometric-based methods. However, both methods have shortcomings, including ignoring geometric cues and affecting by large non-overlapped regions between observations. In this paper, we introduce a novel approach that integrates semantic graph matching and distance fields (DF) matching for cross-view and cross-modal place recognition. Our method uses a graph representation to encode visual-spatial cues of semantics and uses a set of class-wise DFs to encode geometric cues of a scene. Then, we formulate place recognition as a two-step matching problem. We first perform semantic graph matching to identify the correspondence of semantic objects. Then, we estimate the overlapped regions based on the identified correspondences and further align these regions to compute their geometric-based DF similarity. Finally, we integrate graph-based similarity and geometry-based DF similarity to match places. We evaluate our approach over two public benchmark datasets, including KITTI and AirSim. Compared with the previous methods, our approach achieves around 10% improvement in ground-ground place recognition in KITTI and 35% improvement in aerial-ground place recognition in AirSim.

IROS Conference 2022 Conference Paper

Audio-Visual Depth and Material Estimation for Robot Navigation

Justin Wilson
Nicholas Rewkowski
Ming Lin 0003

Reflective and textureless surfaces such as windows, mirrors, and walls can be a challenge for scene reconstruction, due to depth discontinuities and holes. We propose an audio-visual method that uses the reflections of sound to aid in depth estimation and material classification for 3D scene reconstruction in robot navigation and AR/VR applications. The mobile phone prototype emits pulsed audio, while recording video for audio-visual classification for 3D scene reconstruction. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. The inferences from these classifications enhance 3D scene reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. Our prototype, demos, and experimental results from real-world with challenging surfaces and sound, also validated with virtual scenes, indicate high success rates on classification of material, depth estimation, and closed/open surfaces, leading to considerable improvement in 3D scene reconstruction for robot navigation.

IROS Conference 2022 Conference Paper

Inverse Reinforcement Learning with Hybrid-weight Trust-region Optimization and Curriculum Learning for Autonomous Maneuvering

Yu Shen
Weizi Li
Ming Lin 0003

Despite significant advancements, collision-free navigation in autonomous driving is still challenging, considering the navigation module needs to balance learning and planning to achieve efficient and effective control of the vehicle. We propose a novel framework of inverse reinforcement learning with hybrid-weight trust-region optimization and curriculum learning (IRL-HC) for autonomous maneuvering. Our method can incorporate both expert demonstration (from real driving) and domain knowledge (hard constraints such as collision avoidance, goal reaching, etc. encoded in reward functions) to learn an effective control policy. The hybrid-weight trustregion optimization is used to determine the difficulty of the task curriculum for fast incremental curriculum learning and improve the efficiency of inverse reinforcement learning by hybrid weight tuning of different sets of hyperparameters. IRL-HC is also compatible with domain-dependent techniques such as learn-from-accident, which can further boost performance. Overall, IRL-HC can reduce the number of collisions up to 48%, increase the training efficiency by 2. 8x, and enable the vehicle to drive 10x further compared to other methods.

ICRA Conference 2021 Conference Paper

Adversarial Differentiable Data Augmentation for Autonomous Systems

Manli Shu
Yu Shen
Ming Lin 0003
Tom Goldstein

Autonomous systems often rely on neural networks to achieve high performance on planning and control problems. Unfortunately, neural networks suffer severely when input images become degraded in ways that are not reflected in the training data. This is particularly problematic for robotic systems like autonomous vehicles (AV) for which reliability is paramount. In this work, we consider robust optimization methods for hardening control systems against image corruptions and other unexpected domain shifts. Recent work on robust optimization for neural nets has been focused largely on combating adversarial attacks. In this work, we borrow ideas from the adversarial training and data augmentation literature to enhance robustness to image corruptions and domain shifts. To this end, we train networks while augmenting image data with a battery of image degradations. Unlike traditional augmentation methods, we choose the parameters for each degradation adversarially so as to maximize system performance. By formulating image degradations in a way that is differentiable with respect to degradation parameters, we enable the use of efficient optimization methods (PGD) for choosing worst-case augmentation parameters. We demonstrate the efficacy of this method on the learning to steer task for AVs. By adversarially training against image corruptions, we produce networks that are highly robust to image corruptions. We show that the proposed differentiable augmentation schemes result in higher levels of robustness and accuracy for a range of settings as compared to baseline and state-of-the-art augmentation methods.

ICML Conference 2021 Conference Paper

Efficient Differentiable Simulation of Articulated Bodies

Yi-Ling Qiao
Junbang Liang
Vladlen Koltun
Ming Lin 0003

We present a method for efficient differentiable simulation of articulated bodies. This enables integration of articulated body dynamics into deep learning frameworks, and gradient-based optimization of neural networks that operate on articulated bodies. We derive the gradients of the contact solver using spatial algebra and the adjoint method. Our approach is an order of magnitude faster than autodiff tools. By only saving the initial states throughout the simulation process, our method reduces memory requirements by two orders of magnitude. We demonstrate the utility of efficient differentiable dynamics for articulated bodies in a variety of applications. We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method. In applications to control and inverse problems, gradient-based optimization enabled by our work accelerates convergence by more than an order of magnitude.

ICRA Conference 2021 Conference Paper

Multi-Agent Ergodic Coverage in Urban Environments

Shivang Patel
Senthil Hariharan Arul
Pranav Dhulipala
Ming Lin 0003
Dinesh Manocha
Huan Xu 0002
Michael W. Otte

An important aspect of dynamic urban coverage is how building collision avoidance is incorporated into the overall coverage mission. We consider a multi-agent urban dynamic coverage problem in which a team of flying agents uses downward facing cameras to observe the street-level environment outside of buildings. Cameras are assumed to be ineffective above a maximum altitude (lower than building height), such that agents must move around or over buildings to complete their mission. The main objective of this paper is to compare three different building avoidance strategies that are compatible with dynamic ergodic methods. To provide context for these results, we also compare our results to three other common coverage methods including: boustrophedon coverage (lawn-mower sweep), Voronoi region based coverage, and a naive grid method. All algorithms are evaluated in simulation with respect to four performance metrics (percent coverage, revisit count, revisit time, and the integral of area viewed over time), across team sizes ranging from 1 to 25 agents, and in five types of urban environments of varying density and height. We find that the relative performance of algorithms changes based on the ratio of team size to search area, as well the height and density characteristics of the urban environment.

ICRA Conference 2020 Conference Paper

AVOT: Audio-Visual Object Tracking of Multiple Objects for Robotics

Justin Wilson
Ming Lin 0003

Existing state-of-the-art object tracking can run into challenges when objects collide, occlude, or come close to one another. These visually based trackers may also fail to differentiate between objects with the same appearance but different materials. Existing methods may stop tracking or incorrectly start tracking another object. These failures are uneasy for trackers to recover from since they often use results from previous frames. By using audio of the impact sounds from object collisions, rolling, etc. , our audio-visual object tracking (AVOT) neural network can reduce tracking error and drift. We train AVOT end to end and use audio-visual inputs over all frames. Our audio-based technique may be used in conjunction with other neural networks to augment visually based object detection and tracking methods. We evaluate its runtime frames-per-second (FPS) performance and intersection over union (IoU) performance against OpenCV object tracking implementations and a deep learning method. Our experiments, using the synthetic Sound-20K audio-visual dataset, demonstrate that AVOT outperforms single-modality deep learning methods, when there is audio from object collisions. A proposed scheduler network to switch between AVOT and other methods based on audio onset maximizes accuracy and performance over all frames in multimodal object tracking.

IROS Conference 2020 Conference Paper

Enhanced Transfer Learning for Autonomous Driving with Systematic Accident Simulation

Shivam Akhauri
Laura Y. Zheng
Ming Lin 0003

Simulation data can be utilized to extend real-world driving data in order to cover edge cases, such as vehicle accidents. The importance of handling edge cases can be observed in the high societal costs in handling car accidents, as well as potential dangers to human drivers. In order to cover a wide and diverse range of all edge cases, we systemically parameterize and simulate the most common accident scenarios. By applying this data to autonomous driving models, we show that transfer learning on simulated data sets provide better generalization and collision avoidance, as compared to random initialization methods. Our results illustrate that information from a model trained on simulated data can be inferred to a model trained on real-world data, indicating the potential influence of simulation data in real world models and advancements in handling of anomalous driving scenarios.

ICML Conference 2020 Conference Paper

Scalable Differentiable Physics for Learning and Control

Yi-Ling Qiao
Junbang Liang
Vladlen Koltun
Ming Lin 0003

Differentiable physics is a powerful approach to learning and control problems that involve physical objects and environments. While notable progress has been made, the capabilities of differentiable physics solvers remain limited. We develop a scalable framework for differentiable physics that can support a large number of objects and their interactions. To accommodate objects with arbitrary geometry and topology, we adopt meshes as our representation and leverage the sparsity of contacts for scalable differentiable collision handling. Collisions are resolved in localized regions to minimize the number of optimization variables even when the number of simulated objects is high. We further accelerate implicit differentiation of optimization with nonlinear constraints. Experiments demonstrate that the presented framework requires up to two orders of magnitude less memory and computation in comparison to recent particle-based methods. We further validate the approach on inverse problems and control scenarios, where it outperforms derivative-free and model-free baselines by at least an order of magnitude.

ICRA Conference 2019 Conference Paper

ADAPS: Autonomous Driving Via Principled Simulations

Weizi Li
David Wolinski
Ming Lin 0003

Autonomous driving has gained significant advancements in recent years. However, obtaining a robust control policy for driving remains challenging as it requires training data from a variety of scenarios, including rare situations (e. g. , accidents), an effective policy architecture, and an efficient learning mechanism. We propose ADAPS for producing robust control policies for autonomous vehicles. ADAPS consists of two simulation platforms in generating and analyzing accidents to automatically produce labeled training data, and a memoryenabled hierarchical control policy. Additionally, ADAPS offers a more efficient online learning mechanism that reduces the number of iterations required in learning compared to existing methods such as DAGGER [1]. We present both theoretical and experimental results. The latter are produced in simulated environments, where qualitative and quantitative results are generated to demonstrate the benefits of ADAPS.

IROS Conference 2019 Conference Paper

Analyzing Liquid Pouring Sequences via Audio-Visual Neural Networks

Justin Wilson
Auston Sterling
Ming Lin 0003

Existing work to estimate the weight of a liquid poured into a target container often require predefined source weights or visual data. We present novel audio-based and audio-augmented techniques, in the form of multimodal convolutional neural networks (CNNs), to estimate poured weight, perform overflow detection, and classify liquid and target container. Our audio-based neural network uses the sound from a pouring sequence–a liquid being poured into a target container. Audio inputs consist of converting raw audio into mel-scaled spectrograms. Our audio-augmented network fuses this audio with its corresponding visual data based on video images. Only a microphone and camera are required, which can be found in any modern smartphone or Microsoft Kinect. Our approach improves classification accuracy for different environments, containers, and contents of the robot pouring task. Our Pouring Sequence Neural Networks (PSNN) are trained and tested using the Rethink Robotics Baxter Research Robot. To the best of our knowledge, this is the first use of audio-visual neural networks to analyze liquid pouring sequences by classifying their weight, liquid, and receiving container.

ICRA Conference 2016 Conference Paper

Bayesian estimation of non-rigid mechanical parameters using temporal sequences of deformation samples

Shan Yang
Ming Lin 0003

Material property has great importance in medical robotics. The mechanical properties of the human soft tissue, are important to characterize the tissue deformation of each patient. The (recovered) elasticity parameters can assist surgeons to perform better pre-op surgical planning and enable medical robots to carry out personalized surgical procedures. In this paper, we present a novel algorithm on mechanical-property estimation from a temporal sequence of deformation samples. It does not require an external force-application measurement device or landmark-based displacement tracking. We test our approach on the reconstruction the Young's modulus of a human heart and further validate the results derived from videos using known parameters of tennis and foam balls.

ICRA Conference 2010 Conference Paper

A fast n-dimensional ray-shooting algorithm for grasping force optimization

Yu Zheng 0001
Ming Lin 0003
Dinesh Manocha

We present an efficient algorithm for solving the ray-shooting problem on high dimensional sets. Our algorithm computes the intersection of the boundary of a compact convex set with a ray emanating from an interior point of the set and represents the intersection point as a convex combination of a set of affinely independent points. We use our intersection algorithm to compute two types of optimal grasping forces, where either the sum or the maximum of normal force components is minimized. In our simulation, the algorithm converges well and performs the computations in tens of milliseconds on a laptop.

IROS Conference 2010 Conference Paper

A walking pattern generator for biped robots on uneven terrains

Yu Zheng 0001
Ming Lin 0003
Dinesh Manocha
Albertus H. Adiwahono
Chee-Meng Chew

We present a new method to generate biped walking patterns for biped robots on uneven terrains. Our formulation uses a universal stability criterion that checks whether the resultant of the gravity wrench and the inertia wrench of a robot lies in the convex cone of the wrenches resulting from contacts between the robot and the environment. We present an algorithm to compute the feasible acceleration of the robot's CoM (center of mass) and use that algorithm to generate biped walking patterns. Our approach is more general and applicable to uneven terrains as compared with prior methods based on the ZMP (zero-moment point) criterion. We highlight its applications on some benchmarks.

ICRA Conference 2009 Conference Paper

Multi-robot coordination using generalized social potential fields

Russell Gayle
William Moss
Ming Lin 0003
Dinesh Manocha

We present a novel approach to compute collision-free paths for multiple robots subject to local coordination constraints. More specifically, given a set of robots, their initial and final configurations, and possibly some additional coordination constraints, our goal is to compute a collision-free path between the initial and final configuration that maintains the constraints. To solve this problem, our approach generalizes the social potential field method to be applicable to both convex and nonconvex polyhedra. Social potential fields are then integrated into a “physics-based motion planning” framework which uses constrained dynamics to solve the motion planning problem. Our approach is able to plan for over 200 robots while averaging about 110 ms per step in a variety of environments.

ICRA Conference 2008 Conference Paper

Reciprocal Velocity Obstacles for real-time multi-agent navigation

Jur van den Berg
Ming Lin 0003
Dinesh Manocha

In this paper, we propose a new concept — the ‘Reciprocal Velocity Obstacle’— for real-time multi-agent navigation. We consider the case in which each agent navigates independently without explicit communication with other agents. Our formulation is an extension of the Velocity Obstacle concept [3], which was introduced for navigation among (passively) moving obstacles. Our approach takes into account the reactive behavior of the other agents by implicitly assuming that the other agents make a similar collision-avoidance reasoning. We show that this method guarantees safe and oscillation-free motions for each of the agents. We apply our concept to navigation of hundreds of agents in densely populated environments containing both static and moving obstacles, and we show that real-time and scalable performance is achieved in such challenging scenarios.

ICRA Conference 2007 Conference Paper

Efficient Motion Planning of Highly Articulated Chains using Physics-based Sampling

Russell Gayle
Stephane Redon
Avneesh Sud
Ming Lin 0003
Dinesh Manocha

We present a novel motion planning algorithm that efficiently generates physics-based samples in a kinematically and dynamically constrained space of a highly articulated chain. Similar to prior kinodynamic planning methods, the sampled nodes in our roadmaps are generated based on dynamic simulation. Moreover, we bias these samples by using constraint forces designed to avoid collisions while moving toward the goal configuration. We adaptively reduce the complexity of the state space by determining a subset of joints that contribute most towards the motion and only simulate these joints. Based on these configurations, we compute a valid path that satisfies non-penetration, kinematic, and dynamics constraints. Our approach can be easily combined with a variety of motion planning algorithms including probabilistic roadmaps (PRMs) and rapidly-exploring random trees (RRTs) and applied to articulated robots with hundreds of joints. We demonstrate the performance of our algorithm on several challenging benchmarks.

IROS Conference 2007 Conference Paper

Reactive deformation roadmaps: motion planning of multiple robots in dynamic environments

Russell Gayle
Avneesh Sud
Ming Lin 0003
Dinesh Manocha

We present a novel algorithm for motion planning of multiple robots amongst dynamic obstacles. Our approach is based on a new roadmap representation that uses deformable links and dynamically retracts to capture the connectivity of the free space. We use Newtonian physics and Hooke's Law to update the position of the milestones and deform the links in response to the motion of other robots and the obstacles. Based on this roadmap representation, we describe our planning algorithms that can compute collision-free paths for tens of robots in complex dynamic environments.

ICRA Conference 2005 Conference Paper

Constraint-Based Motion Planning of Deformable Robots

Russell Gayle
Ming Lin 0003
Dinesh Manocha

We present a novel algorithm for motion planning of a deformable robot in a static environment. Given the initial and final configuration of the robot, our algorithm computes an approximate path using the probabilistic roadmap method. We use "constraint-based planning" to simulate robot deformation and make appropriate path adjustments and corrections to compute a collision-free path. Our algorithm takes into account geometric constraints like non-penetration and physical constraints like volume preservation. We highlight the performance of our planner on different scenarios of varying complexity.

ICRA Conference 2005 Conference Paper

Practical Local Planning in the Contact Space

Stephane Redon
Ming Lin 0003

Proximity query is an integral part of any motion planning algorithm and takes up the majority of planning time. Due to performance issues, most existing planners perform queries at fixed sampled configurations, sometimes resulting in missed collisions. Moreover, randomly determining collision-free configurations makes it difficult to obtain samples close to, or on, the surface of C-obstacles in the configuration space. In this paper, we present an efficient and practical local planning method in contact space which uses “continuous collision detection” (CCD). We show how, using the precise contact information provided by a CCD algorithm, a randomized planner can be enhanced by efficiently sampling the contact space, as well as by constraining the sampling when the roadmap is expanded. We have included our contact-space planning methods in a freely available state-of-the-art planning library - the Stanford MPK library. We have been able to observe that in complex scenarios involving cluttered and narrow passages, which are typically difficult for randomized planners, the enhanced planner offers up to 70 times performance improvement when our contact-space sampling and constrained sampling methods are enabled.

ICRA Conference 2002 Conference Paper

DEEP: Dual-Space Expansion for Estimating Penetration Depth Between Convex Polytopes

Young J. Kim
Ming Lin 0003
Dinesh Manocha

We present an incremental algorithm to estimate the penetration depth between convex polytopes in 3D. The algorithm incrementally seeks a "locally optimal solution" by walking on the surface of the Minkowski sums. The surface of the Minkowski sums is computed implicitly by constructing a local Gauss map. In practice, the algorithm works well when there is high motion coherence in the environment and is able to compute the optimal solution in most cases.

ICRA Conference 2002 Conference Paper

Haptic Interaction for Creative Processes with Simulated Media

Ming Lin 0003
William V. Baxter III
Mark Foskey
Miguel A. Otaduy
Vincent Scheib

We present a survey of our recent research on the development of haptic interfaces for simulating creative processes with digital media, including 3D multiresolution modeling and 2D and 3D painting. We discuss the design issues involved and lessons learned. Based on the preliminary user studies, we observe that haptic interfaces can improve the level of usability of digital design systems and assist in capturing the feel of creative processes.

IROS Conference 2001 Conference Paper

A Voronoi-based hybrid motion planner

Mark Foskey
Maxim Garber
Ming Lin 0003
Dinesh Manocha

We present a hybrid path planning algorithm for rigid and articulated bodies translating and rotating in a 3D workspace. Our approach generates a Voronoi roadmap in the workspace and combines it with "bridges" computed by a randomized path planner with Voronoi-biased sampling. The Voronoi roadmap is computed from a discrete approximation to the generalized Voronoi diagram (GVD) of the workspace, which is generated using graphics hardware. By using this GVD, portions of the path can be generated without random sampling, substantially reducing the number of random samples needed for the full query. The planner has been implemented and tested on a number of benchmarks. Some preliminary comparisons with a randomized motion planner indicate that our planner performs more than an order of magnitude faster in several challenging scenarios.

IROS Conference 2001 Conference Paper

Fast penetration depth estimation for elastic bodies using deformed distance fields

Susan Fisher
Ming Lin 0003

We present a fast penetration depth estimation algorithm between deformable polyhedral objects. We assume the continuum of non-rigid models are discretized using standard techniques, such as finite element or finite difference methods. As the objects deform, the pre-computed distance fields are deformed accordingly to estimate the penetration depth, allowing an enforcement of non-penetration constraints between two colliding elastic bodies. This approach can automatically handle self-penetration and inter-penetration in a uniform manner. We demonstrate its effectiveness on moderately complex simulation scenes.

IROS Conference 2000 Conference Paper

Accelerated proximity queries between convex polyhedra by multi-level Voronoi marching

Stephen A. Ehmann
Ming Lin 0003

We present an accelerated proximity query algorithm between moving convex polyhedra. The algorithm combines Voronoi-based feature tracking with a multi-level-of-detail representation, in order to adapt to the variation in levels of coherence and speed up the computation. It provides a progressive refinement framework for collision detection and distance queries. We have implemented our algorithm and have observed significant performance improvements in our experiments, especially on scenarios where the coherence is low.

ICRA Conference 2000 Conference Paper

Fast Distance Queries with Rectangular Swept Sphere Volumes

Eric Larsen
Stefan Gottschalk
Ming Lin 0003
Dinesh Manocha

We present new distance computation algorithms using hierarchies of rectangular swept spheres. Each bounding volume of the tree is described as the Minkowski sum of a rectangle and a sphere, and fits tightly to the underlying geometry. We present accurate and efficient algorithms to build the hierarchies and perform distance queries between the bounding volumes. We also present traversal techniques for accelerating distance queries using coherence and priority directed search. These algorithms have been used to perform proximity queries for applications including virtual prototyping, dynamic simulation, and motion planning on complex models. As compared to earlier algorithms based on bounding volume hierarchies for separation distance and approximate distance computation, our algorithms have achieved significant speedups on many benchmarks.

ICRA Conference 2000 Conference Paper

Interactive Motion Planning Using Hardware-Accelerated Computation of Generalized Voronoi Diagrams

Kenneth E. Hoff III
Tim Culver
John Keyser
Ming Lin 0003
Dinesh Manocha

We present techniques for fast motion planning by using discrete approximations of generalized Voronoi diagrams, computed with graphics hardware. Approaches based on this diagram computation are applicable to both static and dynamic environments of fairly high complexity. We compute a discrete Voronoi diagram by rendering a 3D distance mesh for each Voronoi site. The sites can be points, line segments, polygons, polyhedra, curves and surfaces. The computation of the generalized Voronoi diagram provides fast proximity query toolkits for motion planning. The tools provide the distance to the nearest obstacle stored in the Z-buffer, as well as the Voronoi boundaries, Voronoi vertices and weighted Voronoi graphs extracted from the frame buffer using continuation methods. We have implemented these algorithms and demonstrated their performance for path planning in a complex dynamic environment composed of more than 140, 000 polygons.

ICRA Conference 1995 Conference Paper

Fast Algorighms for Penetration and Contact Determination between Non-Convex Polyhedral Models

Ming Lin 0003
Dinesh Manocha
Madhav K. Ponamgi

We present fast algorithms for penetration detection and contact determination between polyhedral models in dynamic environments. They are based on a distance computation algorithm for convex polytopes and a hierarchical coherence-based algorithm to compute contacts. In particular, we extend an earlier expected constant time algorithm for distance computation between convex polytopes to detect penetrations. The algorithm computes all the contacts between the convex hulls of the polytopes. After identifying the contact regions it traverses the features lying beneath them to more precisely determine the contact regions. The traversal employs a dynamic technique, sweep and prune, to overcome the O(n/sup 2/) pairwise feature checks. The complexity of the overall algorithm is output sensitive. We demonstrate its performance on the dynamic simulation of a threaded insertion.

ICRA Conference 1994 Conference Paper

Fast Contact Determination in Dynamic Environments

Ming Lin 0003
Dinesh Manocha
John F. Canny

We present an efficient contact determination algorithm for objects undergoing rigid motion. The environment consists of polytopes and models described by algebraic sets. We extend an expected constant time collision detection algorithm between convex polytopes to concave polytopes and curved models. The algorithm makes use of hierarchical representations for concave polytopes and local, global methods for solving polynomial equations to determine possible contact points. We also propose techniques to reduce O(n/sup 2/) pairwise intersection tests for a large environment of n objects. These algorithms work well in practice and give real time performance for most environments. >

ICRA Conference 1991 Conference Paper

A fast algorithm for incremental distance calculation

Ming Lin 0003
John F. Canny

A simple and efficient algorithm for finding the closest points between two convex polynomials is described. Data from numerous experiments tested on a broad set of convex polyhedra on R/sup 3/ show that the running time is roughly constant for finding closest points when nearest points are approximately known and is linear in total number of vertices if no special initialization is done. This algorithm can be used for collision detection, computation of the distance between two polyhedra in three-dimensional space, and other robotics problems. It forms the heart of the motion planning algorithm previously presented by the authors (Proc. IEEE ICRA, p. 1554-9, 1990). >

ICRA Conference 1990 Conference Paper

An opportunistic global path planner

John F. Canny
Ming Lin 0003

A robot planning algorithm that constructs a global skeleton of free-space by incremental local methods is described. The curves of the skeleton are the loci of maxima of an artificial potential field that is directly proportional to the distance of the robot from obstacles. The method has the advantage of fast convergence of local methods in uncluttered environments, but it also has a deterministic and efficient method of escaping local extremal points of the potential function. The authors present a general algorithm, for configuration spaces of any dimension, and describe instantiations of the algorithm for robots with two and three degrees of freedom. >