Arrow Research search

Author name cluster

Wei Zhan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

48 papers
2 author rows

Possible papers

48

ICRA Conference 2025 Conference Paper

Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-Based Autonomous Driving

  • Yichen Xie 0002
  • Hongge Chen
  • Gregory P. Meyer
  • Yong Jae Lee
  • Eric M. Wolff
  • Masayoshi Tomizuka
  • Wei Zhan
  • Yuning Chai

Multi-frame temporal inputs are important for vision-based autonomous driving. Observations from different angles enable the recovery of 3 D object states from 2 D images as long as we can identify the same instance from different input frames. However, the dynamic nature of driving scenes leads to significant variance in the instance appearance and shape captured by the cameras at different time steps. To this end, we propose a novel contrastive learning algorithm, Cohere3D, to learn coherent instance representations robust to the changes of distance and perspective in a long-term temporal sequence without any human annotations. In the pretraining stage, raw point clouds from LiDAR sensors are utilized to construct the instance-wise long-term temporal correspondence, which serves as guidance for the extraction of instance-level representation from the vision-based bird's-eye-view (BEV) feature map. Cohere3D encourages consistent representation for the same instance at different frames but distinguishes between different instances. We validate the effectiveness and generalizability of our algorithm by finetuning the pretrained model across key downstream autonomous driving tasks: perception, mapping, prediction, and planning. Results show a notable improvement in both data efficiency and final performance in all these tasks.

NeurIPS Conference 2025 Conference Paper

DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving

  • Hao Lu
  • Tianshuo Xu
  • Wenzhao Zheng
  • Yunpeng Zhang
  • Wei Zhan
  • Dalong Du
  • Masayoshi Tomizuka
  • Kurt Keutzer

Large reconstruction model has remarkable progress, which can directly predict 3D or 4D representations for unseen scenes and objects. However, current work has not systematically explored the potential of large reconstruction models in the field of autonomous driving. To achieve this, we introduce the Large 4D Gaussian Reconstruction Model (DrivingRecon). With an elaborate and simple framework design, it not only ensures efficient and high-quality reconstruction, but also provides potential for downstream tasks. There are two core contributions: firstly, the Prune and Dilate Block (PD-Block) is proposed to prune redundant and overlapping Gaussian points and dilate Gaussian points for complex objects. Then, dynamic and static decoupling is tailored to better learn the temporary-consistent geometry across different time. Experimental results demonstrate that DrivingRecon significantly improves scene reconstruction quality compared to existing methods. Furthermore, we explore applications of DrivingRecon in model pre-training, vehicle type adaptation, and scene editing. Our code will be available.

ICRA Conference 2025 Conference Paper

Embodiment-agnostic Action Planning via Object-Part Scene Flow

  • Weiliang Tang
  • Jia-Hui Pan
  • Wei Zhan
  • Jianshu Zhou
  • Huaxiu Yao
  • Yun-Hui Liu 0001
  • Masayoshi Tomizuka
  • Mingyu Ding

Observing that the key for robotic action planning is to understand the target-object motion when its associated part is manipulated by the end effector, we propose to generate the 3D object-part scene flow and extract its transformations to solve the action trajectories for diverse embodiments. The advantage of our approach is that it derives the robot action explicitly from object motion prediction, yielding a more robust policy by understanding the object motions. Also, beyond policies trained on embodiment-centric data, our method is embodiment-agnostic, generalizable across diverse embodiments, and being able to learn from human demonstrations. Our method comprises three components: an object-part predictor to locate the part for the end effector to manipulate, an RGBD video generator to predict future RGBD videos, and a trajectory planner to extract embodiment-agnostic transformation sequences and solve the trajectory for diverse embodiments. Trained on videos even without trajectory data, our method still outperforms existing works significantly by 27. 7% and 26. 2% on the prevailing virtual environments MetaWorld and Franka-Kitchen, respectively. Furthermore, we conducted real-world experiments, showing that our policy, trained only with human demonstration, can be deployed to various embodiments.

IROS Conference 2025 Conference Paper

P 2 Explore: Efficient Exploration in Unknown Cluttered Environment with Floor Plan Prediction

  • Kun Song
  • Gaoming Chen
  • Masayoshi Tomizuka
  • Wei Zhan
  • Zhenhua Xiong 0001
  • Mingyu Ding

Robot exploration aims at the reconstruction of unknown environments, and it is important to achieve it with shorter paths. Traditional methods focus on optimizing the visiting order of frontiers based on current observations, which may lead to local-minimal results. Recently, by predicting the structure of the unseen environment, the exploration efficiency can be further improved. However, in a cluttered environment, due to the randomness of obstacles, the ability to predict is weak. Moreover, this inaccuracy will lead to limited improvement in exploration. Therefore, we propose FPUNet which can be efficient in predicting the layout of noisy indoor environments. Then, we extract the segmentation of rooms and construct their topological connectivity based on the predicted map. The visiting order of these predicted rooms is optimized which can provide high-level guidance for exploration. The FPUNet is compared with other network architectures which demonstrates it is the SOTA method for this task. Extensive experiments in simulations show that our method can shorten the path length by 2. 18% to 34. 60% compared to the baselines.

IROS Conference 2025 Conference Paper

PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models

  • Dingkun Guo
  • Yuqi Xiang
  • Shuqi Zhao
  • Xinghao Zhu
  • Masayoshi Tomizuka
  • Mingyu Ding
  • Wei Zhan

Robotic grasping, crucial for robot interaction with objects, still struggles with counter-intuitive or long-tailed scenarios like uncommon materials and shapes. Humans, however, intuitively adjust grasps with their physics-informed interpretations of the object, using visual and linguistic cues. This work introduces PhyGrasp, a large multimodal model and dataset that enhance robotic manipulation by combining natural language and 3D point clouds using a bridge module to integrate these inputs. The language modality exhibits robust reasoning capabilities concerning the impacts of diverse physical properties on grasping, while the 3D modality comprehends object shapes and parts. With these two capabilities, PhyGrasp is able to accurately assess the physical properties of object parts and determine optimal grasping poses. Additionally, the model’s language comprehension enables human instruction interpretation, generating grasping poses that align with human preferences. To train PhyGrasp, we construct a dataset PhyPartNet with 195K object instances with varying physical properties and human preferences, alongside their corresponding language descriptions. Extensive experiments conducted in the simulation and on the real robots demonstrate that PhyGrasp achieves state-of-the-art performance, particularly in long-tailed cases, e. g. , about 10% improvement in success rate over GraspNet. More demos and information are available on https://sites.google.com/view/phygrasp.

ICRA Conference 2025 Conference Paper

Physics-Aware Robotic Palletization With Online Masking Inference

  • Tianqi Zhang
  • Zheng Wu 0002
  • Yuxin Chen
  • Yixiao Wang
  • Boyuan Liang
  • Scott Moura
  • Masayoshi Tomizuka
  • Mingyu Ding

The efficient planning of stacking boxes, especially in the online setting where the sequence of item arrivals is unpredictable, remains a critical challenge in modern warehouse and logistics management. Existing solutions often address box size variations, but overlook their intrinsic and physical properties, such as density and rigidity, which are crucial for real-world applications. We use reinforcement learning (RL) to solve this problem by employing action space masking to direct the RL policy toward valid actions. Unlike previous methods that rely on heuristic stability assessments which are difficult to assess in physical scenarios, our framework utilizes online learning to dynamically train the action space mask, eliminating the need for manual heuristic design. Extensive experiments demonstrate that our proposed method outperforms existing state-of-the-arts. Furthermore, we deploy our learned task planner in a real-world robotic palletizer, validating its practical applicability in operational settings. The code is available at https://github.com/tianqi-zh/palletization.

ICLR Conference 2025 Conference Paper

Residual-MPPI: Online Policy Customization for Continuous Control

  • Pengcheng Wang 0004
  • Chenran Li
  • Catherine Weaver
  • Kenta Kawamoto
  • Masayoshi Tomizuka
  • Chen Tang 0001
  • Wei Zhan

Policies developed through Reinforcement Learning (RL) and Imitation Learning (IL) have shown great potential in continuous control tasks, but real-world applications often require adapting trained policies to unforeseen requirements. While fine-tuning can address such needs, it typically requires additional data and access to the original training metrics and parameters. In contrast, an online planning algorithm, if capable of meeting the additional requirements, can eliminate the necessity for extensive training phases and customize the policy without knowledge of the original training scheme or task. In this work, we propose a generic online planning algorithm for customizing continuous-control policies at the execution time, which we call Residual-MPPI. It can customize a given prior policy on new performance metrics in few-shot and even zero-shot online settings, given access to the prior action distribution alone. Through our experiments, we demonstrate that the proposed Residual-MPPI algorithm can accomplish the few-shot/zero-shot online policy customization task effectively, including customizing the champion-level racing agent, Gran Turismo Sophy (GT Sophy) 1.0, in the challenging car racing scenario, Gran Turismo Sport (GTS) environment. Code for MuJoCo experiments is included in the supplementary and will be open-sourced upon acceptance. Demo videos are available on our website: https://sites.google.com/view/residual-mppi.

ICRA Conference 2025 Conference Paper

TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection

  • Philip L. Jacobson
  • Yichen Xie 0002
  • Mingyu Ding
  • Chenfeng Xu
  • Masayoshi Tomizuka
  • Wei Zhan
  • Ming C. Wu

Semi-supervised 3D object detection is a common strategy employed to circumvent the challenge of manually labeling large-scale autonomous driving perception datasets. Pseudo-labeling approaches to semi-supervised learning adopt a teacher-student framework in which machine-generated pseudo-labels on a large unlabeled dataset are used in combination with a small manually-labeled dataset for training. In this work, we address the problem of improving pseudo-label quality through leveraging long- term temporal information captured in driving scenes. More specifically, we leverage pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data to further enhance the student model training. Our approach improves pseudo-label quality in two distinct manners: first, we suppress false positive pseudo-labels through establishing consistency across multiple frames of motion forecasting outputs. Second, we compensate for false negative detections by directly inserting predicted object tracks into the pseudo-labeled scene. Experiments on the nuScenes dataset demonstrate the effectiveness of our approach, improving the performance of standard semi-supervised approaches in a variety of settings.

ICML Conference 2025 Conference Paper

WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving

  • Yiheng Li
  • Cunxin Fan
  • Chongjian Ge
  • Seth Z. Zhao
  • Chenran Li
  • Chenfeng Xu
  • Huaxiu Yao
  • Masayoshi Tomizuka

Language models uncover unprecedented abilities in analyzing driving scenarios, owing to their limitless knowledge accumulated from text-based pre-training. Naturally, they should particularly excel in analyzing rule-based interactions, such as those triggered by traffic laws, which are well documented in texts. However, such interaction analysis remains underexplored due to the lack of dedicated language datasets that address it. Therefore, we propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a comprehensive large-scale Q&As dataset built on WOMD focusing on describing and reasoning traffic rule-induced interactions in driving scenarios. WOMD-Reasoning also presents by far the largest multi-modal Q&A dataset, with 3 million Q&As on real-world driving scenarios, covering a wide range of driving topics from map descriptions and motion status descriptions to narratives and analyses of agents’ interactions, behaviors, and intentions. To showcase the applications of WOMD-Reasoning, we design Motion-LLaVA, a motion-language model fine-tuned on WOMD-Reasoning. Quantitative and qualitative evaluations are performed on WOMD-Reasoning dataset as well as the outputs of Motion-LLaVA, supporting the data quality and wide applications of WOMD-Reasoning, in interaction predictions, traffic rule compliance plannings, etc. The dataset and its vision modal extension are available on https: //waymo. com/open/download/. The codes & prompts to build it are available on https: //github. com/yhli123/WOMD-Reasoning.

ICLR Conference 2025 Conference Paper

X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios

  • Yichen Xie 0002
  • Chenfeng Xu
  • Chensheng Peng
  • Shuqi Zhao
  • Nhat Ho
  • Alexander T. Pham
  • Mingyu Ding
  • Masayoshi Tomizuka

Recent advancements have exploited diffusion models for the synthesis of either LiDAR point clouds or camera image data in driving scenarios. Despite their success in modeling single-modality data marginal distribution, there is an under- exploration in the mutual reliance between different modalities to describe com- plex driving scenes. To fill in this gap, we propose a novel framework, X-DRIVE, to model the joint distribution of point clouds and multi-view images via a dual- branch latent diffusion model architecture. Considering the distinct geometrical spaces of the two modalities, X-DRIVE conditions the synthesis of each modality on the corresponding local regions from the other modality, ensuring better alignment and realism. To further handle the spatial ambiguity during denoising, we design the cross-modality condition module based on epipolar lines to adaptively learn the cross-modality local correspondence. Besides, X-DRIVE allows for controllable generation through multi-level input conditions, including text, bounding box, image, and point clouds. Extensive results demonstrate the high-fidelity synthetic results of X-DRIVE for both point clouds and multi-view images, adhering to input conditions while ensuring reliable cross-modality consistency. Our code will be made publicly available at https://github.com/yichen928/X-Drive.

IROS Conference 2024 Conference Paper

DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation

  • Huixin Zhang
  • Guangming Wang 0001
  • Xinrui Wu
  • Chenfeng Xu
  • Mingyu Ding
  • Masayoshi Tomizuka
  • Wei Zhan
  • Hesheng Wang 0001

This paper introduces a 3D point cloud sequence learning model based on inconsistent spatio-temporal propagation for LiDAR odometry, termed DSLO. It consists of a pyramid structure with a spatial information reuse strategy, a sequential pose initialization module, a gated hierarchical pose refinement module, and a temporal feature propagation module. First, spatial features are encoded using a point feature pyramid, with features reused in successive pose estimations to reduce computational overhead. Second, a sequential pose initialization method is introduced, leveraging the high-frequency sampling characteristic of LiDAR to initialize the LiDAR pose. Then, a gated hierarchical pose refinement mechanism refines poses from coarse to fine by selectively retaining or discarding motion information from different layers based on gate estimations. Finally, temporal feature propagation is proposed to incorporate the historical motion information from point cloud sequences, and address the spatial inconsistency issue when transmitting motion information embedded in point clouds between frames. Experimental results on the KITTI odometry dataset and Argoverse dataset demonstrate that DSLO outperforms state-of-the-art methods, achieving at least a 15. 67% improvement on RTE and a 12. 64% improvement on RRE, while also achieving a 34. 69% reduction in runtime compared to baseline methods. Our implementation will be available at https://github.com/IRMVLab/DSLO.

ICRA Conference 2024 Conference Paper

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

  • Jinning Li 0002
  • Xinyi Liu
  • Banghua Zhu
  • Jiantao Jiao
  • Masayoshi Tomizuka
  • Chen Tang 0001
  • Wei Zhan

Safe Reinforcement Learning (RL) aims to find a policy that achieves high rewards while satisfying cost constraints. When learning from scratch, safe RL agents tend to be overly conservative, which impedes exploration and restrains the overall performance. In many realistic tasks, e. g. autonomous driving, large-scale expert demonstration data are available. We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue. Large-capacity models, e. g. decision transformers (DT), have been proven to be competent in offline policy learning. However, data collected in realworld scenarios rarely contain dangerous cases (e. g. , collisions), which makes it prohibitive for the policies to learn safety concepts. Besides, these bulk policy networks cannot meet the computation speed requirements at inference time on real-world tasks such as autonomous driving. To this end, we propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework. GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms. Experiments in both benchmark safe RL tasks and real-world driving tasks based on the Waymo Open Motion Dataset (WOMD) [1] demonstrate that GOLD can successfully distill lightweight policies and solve decision-making problems in challenging safety-critical scenarios.

ICRA Conference 2024 Conference Paper

Open X-Embodiment: Robotic Learning Datasets and RT-X Models: Open X-Embodiment Collaboration

  • Abby O'Neill
  • Abdul Rehman
  • Abhiram Maddukuri
  • Abhishek Gupta 0004
  • Abhishek Padalkar
  • Abraham Lee
  • Acorn Pooley
  • Agrim Gupta

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train "generalist" X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. The project website is robotics-transformer-x. github.io.

IROS Conference 2024 Conference Paper

Pre-training on Synthetic Driving Data for Trajectory Prediction

  • Yiheng Li
  • Seth Z. Zhao
  • Chenfeng Xu
  • Chen Tang 0001
  • Chenran Li
  • Mingyu Ding
  • Masayoshi Tomizuka
  • Wei Zhan

Accumulating substantial volumes of real-world driving data proves pivotal in the realm of trajectory forecasting for autonomous driving. Given the heavy reliance of current trajectory forecasting models on data-driven methodologies, we aim to tackle the challenge of learning general trajectory forecasting representations under limited data availability. We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting. The solution is composed of two parts: firstly, we adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them. Specifically, we apply vector transformations to reshape the maps, and then employ a rule-based model to generate trajectories on both original and augmented scenes; thus enlarging the driving data without collecting additional real ones. To foster the learning of general representations within this augmented dataset, we comprehensively explore the different pre-training strategies, including extending the concept of a Masked AutoEncoder (MAE) for trajectory forecasting. Without bells and whistles, our proposed pipeline-level solution is general, simple, yet effective: we conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies, which outperform the baseline prediction model by large margins, e. g. 5. 04%, 3. 84% and 8. 30% in terms of MR 6, minADE 6 and minFDE 6. The pre-training dataset and the codes for pre-training and fine-tuning are released at https://github.com/yhli123/Pretraining_on_Synthetic_Driving_Data_for_Trajectory_Prediction.

AAMAS Conference 2024 Conference Paper

Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

  • Yuxin Chen
  • Chen Tang
  • Ran Tian
  • Chenran Li
  • Jinning Li
  • Masayoshi Tomizuka
  • Wei Zhan

Generalization in Multi-agent Reinforcement Learning (MARL) is challenging. Introducing a diverse set of co-play agents typically boosts the agent’s generalization to unseen co-players. However, the extent to which an agent is influenced by co-players varies across scenarios and environments; thus, the improvement in generalization introduced by diversifying co-players also varies. In this work, we introduce Level of Influence (LoI), a novel metric measuring the interaction intensity among agents within a given scenario and environment. We show that LoI can effectively predict the disparities in the benefits of diversifying co-player distribution across scenarios, offering insights into optimizing training cost for varied situations. The code is available at: https: //github. com/ ThomasChen98/Level-of-Influence.

RLJ Journal 2024 Journal Article

Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement Learning

  • Yuxin Chen
  • Chen Tang
  • Thomas Tian
  • Chenran Li
  • Jinning Li
  • Masayoshi Tomizuka
  • Wei Zhan

Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which unseen co-players influence an agent depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on how to effectively train agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget. The code is available at: https://github.com/ThomasChen98/Level-of-Influence.

RLC Conference 2024 Conference Paper

Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement Learning

  • Yuxin Chen
  • Chen Tang
  • Thomas Tian
  • Chenran Li
  • Jinning Li
  • Masayoshi Tomizuka
  • Wei Zhan

Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which unseen co-players influence an agent depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on how to effectively train agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget. The code is available at: https: //github. com/ThomasChen98/Level-of-Influence.

ICLR Conference 2024 Conference Paper

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

  • Haoyu Lu
  • Yuqi Huo
  • Guoxing Yang
  • Zhiwu Lu 0001
  • Wei Zhan
  • Masayoshi Tomizuka
  • Mingyu Ding

Large-scale vision-language pre-trained models have shown promising transferability to various downstream tasks. As the size of these foundation models and the number of downstream tasks grow, the standard full fine-tuning paradigm becomes unsustainable due to heavy computational and storage costs. This paper proposes UniAdapter, which unifies unimodal and multimodal adapters for parameter-efficient cross-modal adaptation on pre-trained vision-language models. Specifically, adapters are distributed to different modalities and their interactions, with the total number of tunable parameters reduced by partial weight sharing. The unified and knowledge-sharing design enables powerful cross-modal representations that can benefit various downstream tasks, requiring only 1.0%-2.0% tunable parameters of the pre-trained model. Extensive experiments on 7 cross-modal downstream benchmarks (including video-text retrieval, image-text retrieval, VideoQA, VQA and Caption) show that in most cases, UniAdapter not only outperforms the state-of-the-arts, but even beats the full fine-tuning strategy. Particularly, on the MSRVTT retrieval task, UniAdapter achieves 49.7% recall@1 with 2.2% model parameters, outperforming the latest competitors by 2.0%. The code and models are available at https://github.com/RERV/UniAdapter.

ICRA Conference 2023 Conference Paper

Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based Objects

  • Philip L. Jacobson
  • Yiyang Zhou
  • Wei Zhan
  • Masayoshi Tomizuka
  • Ming C. Wu

Leveraging multi-modal fusion, especially between camera and LiDAR, has become essential for building accurate and robust 3D object detection systems for autonomous vehicles. Until recently, point decorating approaches, in which point clouds are augmented with camera features, have been the dominant approach in the field. However, these approaches fail to utilize the higher resolution images from cameras. Recent works projecting camera features to the bird's-eye-view (BEV) space for fusion have also been proposed, however they require projecting millions of pixels, most of which only contain background information. In this work, we propose a novel approach Center Feature Fusion (CFF), in which we leverage center-based detection networks in both the camera and LiDAR streams to identify relevant object locations. We then use the center-based detection to identify the locations of pixel features relevant to object locations, a small fraction of the total number in the image. These are then projected and fused in the BEV frame. On the nuScenes dataset, we outperform the LiDAR-only baseline by 4. 9% mAP while fusing up to 100x fewer features than other fusion methods.

FOCS Conference 2023 Conference Paper

Certified Hardness vs. Randomness for Log-Space

  • Edward Pyne
  • Ran Raz
  • Wei Zhan

Let $\mathcal{L}$ be a language that can be decided in linear space and let $\epsilon \gt 0$ be any constant. Let $\mathcal{A}$ be the exponential hardness assumption that for every n, membership in $\mathcal{L}$ for inputs of length n cannot be decided by circuits of size smaller than $2^{\epsilon n}$. We prove that for every function $f: \{0, 1\}^{*} \rightarrow\{0, 1\}$, computable by a randomized logspace algorithm R, there exists a deterministic logspace algorithm D (attempting to compute f), such that on every input x of length n, the algorithm D outputs one of the following: 1)The correct value $f(x)$. 2)The string: “I am unable to compute $f(x)$ because the hardness assumption $\mathcal{A}$ is false”, followed by a (provenly correct) circuit of size smaller than $2^{\epsilon n^{\prime}}$ for membership in $\mathcal{L}$ for inputs of length $n^{\prime}$, for some $n^{\prime}=\Theta(\log n)$; that is, a circuit that refutes $\mathcal{A}$. Moreover, D is explicitly constructed, given R. We note that previous works on the hardness-versus-randomness paradigm give derandomized algorithms that rely blindly on the hardness assumption. If the hardness assumption is false, the algorithms may output incorrect values, and thus a user cannot trust that an output given by the algorithm is correct. Instead, our algorithm D verifies the computation so that it never outputs an incorrect value. Thus, if D outputs a value for $f(x)$, that value is certified to be correct. Moreover, if D does not output a value for $f(x)$, it alerts that the hardness assumption was found to be false, and refutes the assumption. Our next result is a universal derandomizer for BPL (the class of problems solvable by bounded-error randomized logspace algorithms) 1: We give a deterministic algorithm U that takes as an input a randomized logspace algorithm R and an input x and simulates the computation of R on x, deteriministically. Under the widely believed assumption $\mathbf{BPL}=\mathbf{L}$, the space used by U is at most $C_{R} \cdot \log n$ (where $C_{R}$ is a constant depending on R). Moreover, for every constant $c \geq 1$, if $\operatorname{BPL} \subseteq \operatorname{SPACE}\left[(\log (n))^{c}\right]$ then the space used by U is at most $C_{R} \cdot(\log (n))^{c}$. Finally, we prove that if optimal hitting sets for ordered branching programs exist then there is a deterministic logspace algorithm that, given a black-box access to an ordered branching program B of size n, estimates the probability that B accepts on a uniformly random input. This extends the result of (Cheng and Hoza CCC 2020), who proved that an optimal hitting set implies a white-box two-sided derandomization. 1 Our result is stated and proved for promise-BPL, but we ignore this difference in the abstract.

NeurIPS Conference 2023 Conference Paper

Doubly-Robust Self-Training

  • Banghua Zhu
  • Mingyu Ding
  • Philip Jacobson
  • Ming Wu
  • Wei Zhan
  • Michael Jordan
  • Jiantao Jiao

Self-training is a well-established technique in semi-supervised learning, which leverages unlabeled data by generating pseudo-labels and incorporating them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly-robust self-training, an innovative semi-supervised algorithm that provably balances between two extremes. When pseudo-labels are entirely incorrect, our method reduces to a training process solely using labeled data. Conversely, when pseudo-labels are completely accurate, our method transforms into a training process utilizing all pseudo-labeled data and labeled data, thus increasing the effective sample size. Through empirical evaluations on both the ImageNet dataset for image classification and the nuScenes autonomous driving dataset for 3D object detection, we demonstrate the superiority of the doubly-robust loss over the self-training baseline.

NeurIPS Conference 2023 Conference Paper

Residual Q-Learning: Offline and Online Policy Customization without Value

  • Chenran Li
  • Chen Tang
  • Haruki Nishimura
  • Jean Mercat
  • Masayoshi Tomizuka
  • Wei Zhan

Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations. It is especially appealing for solving complex real-world tasks where handcrafting reward function is difficult, or when the goal is to mimic human expert behavior. However, the learned imitative policy can only follow the behavior in the demonstration. When applying the imitative policy, we may need to customize the policy behavior to meet different requirements coming from diverse downstream tasks. Meanwhile, we still want the customized policy to maintain its imitative nature. To this end, we formulate a new problem setting called policy customization. It defines the learning task as training a policy that inherits the characteristics of the prior policy while satisfying some additional requirements imposed by a target downstream task. We propose a novel and principled approach to interpret and determine the trade-off between the two task objectives. Specifically, we formulate the customization problem as a Markov Decision Process (MDP) with a reward function that combines 1) the inherent reward of the demonstration; and 2) the add-on reward specified by the downstream task. We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy without knowing the inherent reward or value function of the prior policy. We derive a family of residual Q-learning algorithms that can realize offline and online policy customization, and show that the proposed algorithms can effectively accomplish policy customization tasks in various environments. Demo videos and code are available on our website: https: //sites. google. com/view/residualq-learning.

ICLR Conference 2023 Conference Paper

Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection

  • Jinhyung Park
  • Chenfeng Xu
  • Shijia Yang
  • Kurt Keutzer
  • Kris Kitani
  • Masayoshi Tomizuka
  • Wei Zhan

While recent camera-only 3D detection methods leverage multiple timesteps, the limited history they use significantly hampers the extent to which temporal fusion can improve object perception. Observing that existing works' fusion of multi-frame images are instances of temporal stereo matching, we find that performance is hindered by the interplay between 1) the low granularity of matching resolution and 2) the sub-optimal multi-view setup produced by limited history usage. Our theoretical and empirical analysis demonstrates that the optimal temporal difference between views varies significantly for different pixels and depths, making it necessary to fuse many timesteps over long-term history. Building on our investigation, we propose to generate a cost volume from a long history of image observations, compensating for the coarse but efficient matching resolution with a more optimal multi-view matching setup. Further, we augment the per-frame monocular depth predictions used for long-term, coarse matching with short-term, fine-grained matching and find that long and short term temporal fusion are highly complementary. While maintaining high efficiency, our framework sets new state-of-the-art on nuScenes, achieving first place on the test set and outperforming previous best art by 5.2% mAP and 3.7% NDS on the validation set. Code will be released here: https://github.com/Divadi/SOLOFusion.

NeurIPS Conference 2023 Conference Paper

Towards Free Data Selection with General-Purpose Models

  • Yichen Xie
  • Mingyu Ding
  • Masayoshi Tomizuka
  • Wei Zhan

A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. However, current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. In this paper, we challenge this status quo by designing a distinct data selection pipeline that utilizes existing general-purpose models to select data from various datasets with a single-pass inference without the need for additional training or supervision. A novel free data selection (FreeSel) method is proposed following this new pipeline. Specifically, we define semantic patterns extracted from inter-mediate features of the general-purpose model to capture subtle local information in each image. We then enable the selection of all data samples in a single pass through distance-based sampling at the fine-grained semantic pattern level. FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods. Extensive experiments verify the effectiveness of FreeSel on various computer vision tasks.

ICRA Conference 2022 Conference Paper

Causal-based Time Series Domain Generalization for Vehicle Intention Prediction

  • Yeping Hu
  • Xiaogang Jia
  • Masayoshi Tomizuka
  • Wei Zhan

Accurately predicting the possible behaviors of traffic participants is an essential capability for autonomous vehicles. Since autonomous vehicles need to navigate in dynamically changing environments, they are expected to make accurate predictions regardless of where they are and what driving circumstances they encountered. Therefore, generalization capability to unseen domains is crucial for prediction models when autonomous vehicles are deployed in the real world. In this paper, we aim to address the domain generalization problem for vehicle intention prediction tasks and a causal-based time series domain generalization (CTSDG) model is proposed. We construct a structural causal model for vehicle intention prediction tasks to learn an invariant representation of input driving data for domain generalization. We further integrate a recurrent latent variable model into our structural causal model to better capture temporal latent dependencies from time-series input data. The effectiveness of our approach is evaluated via real-world driving data. We demonstrate that our proposed method has consistent improvement on prediction accuracy compared to other state-of-the-art domain generalization and behavior prediction methods.

IROS Conference 2022 Conference Paper

Domain Knowledge Driven Pseudo Labels for Interpretable Goal-Conditioned Interactive Trajectory Prediction

  • Lingfeng Sun
  • Chen Tang 0001
  • Yaru Niu
  • Enna Sachdeva
  • Chiho Choi
  • Teruhisa Misu
  • Masayoshi Tomizuka
  • Wei Zhan

Motion forecasting in highly interactive scenarios is a challenging problem in autonomous driving. In such scenarios, we need to accurately predict the joint behavior of interacting agents to ensure the safe and efficient navigation of autonomous vehicles. Recently, goal-conditioned methods have gained increasing attention due to their advantage in performance and their ability to capture the multimodality in trajec-tory distribution. In this work, we study the joint trajectory prediction problem with the goal-conditioned framework. In particular, we introduce a conditional-variational-autoencoder-based (CVAE) model to explicitly encode different interaction modes into the latent space. However, we discover that the vanilla model suffers from posterior collapse and cannot induce an informative latent space as desired. To address these issues, we propose a novel approach to avoid KL vanishing and induce an interpretable interactive latent space with pseudo labels. The proposed pseudo labels allow us to incorporate domain knowledge on interaction in a flexible manner. We motivate the proposed method using an illustrative toy example. In addition, we validate our framework on the Waymo Open Motion Dataset with both quantitative and qualitative evaluations.

IROS Conference 2022 Conference Paper

Generalizability Analysis of Graph-based Trajectory Predictor with Vectorized Representation

  • Juanwu Lu
  • Wei Zhan
  • Masayoshi Tomizuka
  • Yeping Hu

Trajectory prediction is one of the essential tasks for autonomous vehicles. Recent progress in machine learning gave birth to a series of advanced trajectory prediction algorithms. Lately, the effectiveness of using graph neural networks (GNNs) with vectorized representations for trajec-tory prediction has been demonstrated by many researchers. Nonetheless, these algorithms either pay little attention to models' generalizability across various scenarios or simply assume training and test data follow similar statistics. In fact, when test scenarios are unseen or Out-of-Distribution (OOD), the resulting train-test domain shift usually leads to significant degradation in prediction performance, which will impact downstream modules and eventually lead to severe accidents. Therefore, it is of great importance to thoroughly investigation of the prediction models in terms of their generalizability, which can not only help identify their weaknesses but also provide insights on how to improve these models. This paper proposes a generalizability analysis framework using feature attribution methods to help interpret black-box models. For the case study, we provide an in-depth generalizability analysis of one of the state-of-the-art graph-based trajectory predictors that utilize vectorized representation. Results show significant performance degradation due to domain shift, and feature attribution provides insights to identify potential causes of these problems. Finally, we conclude the common prediction challenges and how weighting biases induced by the training process can deteriorate the accuracy.

IROS Conference 2022 Conference Paper

Interventional Behavior Prediction: Avoiding Overly Confident Anticipation in Interactive Prediction

  • Chen Tang 0001
  • Wei Zhan
  • Masayoshi Tomizuka

Conditional behavior prediction (CBP) builds up the foundation for a coherent interactive prediction and plan-ning framework that can enable more efficient and less conser-vative maneuvers in interactive scenarios. In CBP task, we train a prediction model approximating the posterior distribution of target agents' future trajectories conditioned on the future trajectory of an assigned ego agent. However, we argue that CBP may provide overly confident anticipation on how the autonomous agent may influence the target agents' behavior. Consequently, it is risky for the planner to query a CBP model. Instead, we should treat the planned trajectory as an intervention and let the model learn the trajectory distribution under intervention. We refer to it as the interventional behavior prediction (IBP) task. Moreover, to properly evaluate an IBP model with offline datasets, we propose a Shapley-value-based metric to verify if the prediction model satisfies the inherent temporal independence of an interventional distribution. We show that the proposed metric can effectively identify a CBP model violating the temporal independence, which plays an important role when establishing IBP benchmarks.

ICRA Conference 2021 Conference Paper

A Safe Hierarchical Planning Framework for Complex Driving Scenarios based on Reinforcement Learning

  • Jinning Li 0002
  • Liting Sun
  • Jianyu Chen 0002
  • Masayoshi Tomizuka
  • Wei Zhan

Autonomous vehicles need to handle various traffic conditions and make safe and efficient decisions and maneuvers. However, on the one hand, a single optimization/sampling-based motion planner cannot efficiently generate safe trajectories in real time, particularly when there are many interactive vehicles near by. On the other hand, end-to-end learning methods cannot assure the safety of the outcomes. To address this challenge, we propose a hierarchical behavior planning framework with a set of low-level safe controllers and a high-level reinforcement learning algorithm (H-CtRL) as a coordinator for the low-level controllers. Safety is guaranteed by the low-level optimization/sampling-based controllers, while the high-level reinforcement learning algorithm makes H-CtRL an adaptive and efficient behavior planner. To train and test our proposed algorithm, we built a simulator that can reproduce traffic scenes using real-world datasets. The proposed HCtRL is proved to be effective in various realistic simulation scenarios, with satisfying performance in terms of both safety and efficiency.

IROS Conference 2021 Conference Paper

A Simple and Efficient Multi-task Network for 3D Object Detection and Road Understanding

  • Di Feng
  • Yiyang Zhou
  • Chenfeng Xu
  • Masayoshi Tomizuka
  • Wei Zhan

Detecting dynamic objects and predicting static road information such as drivable areas and ground heights are crucial for safe autonomous driving. Previous works studied each perception task separately, and lacked a collective quantitative analysis. In this work, we show that it is possible to perform all perception tasks via a simple and efficient multi-task network. Our proposed network, LidarMTL, takes raw LiDAR point cloud as inputs, and predicts six perception outputs for 3D object detection and road understanding. The network is based on an encoder-decoder architecture with 3D sparse convolution and deconvolution operations. Extensive experiments verify the proposed method with competitive accuracies compared to state-of-the-art object detectors and other task-specific networks. LidarMTL is also leveraged for online localization. Code and pre-trained model have been made available at https://github.com/frankfengdi/LidarMTL.

IROS Conference 2021 Conference Paper

Automatic Construction of Lane-level HD Maps for Urban Scenes

  • Yiyang Zhou
  • Yuichi Takeda
  • Masayoshi Tomizuka
  • Wei Zhan

High definition (HD) maps have demonstrated their essential roles in enabling full autonomy, especially in complex urban scenarios. As a crucial layer of the HD map, lane-level maps are particularly useful: they contain geometrical and topological information for both lanes and intersections. However, large scale construction of HD maps is limited by tedious human labeling and high maintenance costs, especially for urban scenarios with complicated road structures and irregular markings. This paper proposes an approach based on semantic-particle filter to tackle the automatic lane-level mapping problem in urban scenes. The map skeleton is firstly structured as a directed cyclic graph from online mapping database OpenStreetMap. Our proposed method then performs semantic segmentation on 2D front-view images from ego vehicles and explores the lane semantics on a birds-eye-view domain with true topographical projection. Exploiting OpenStreetMap, we further infer lane topology and reference trajectory at intersections with the aforementioned lane semantics. The proposed algorithm has been tested in densely urbanized areas, and the results demonstrate accurate and robust reconstruction of the lane-level HD map.

IROS Conference 2021 Conference Paper

Constrained Iterative LQG for Real-Time Chance-Constrained Gaussian Belief Space Planning

  • Jianyu Chen 0002
  • Yutaka Shimizu
  • Liting Sun
  • Masayoshi Tomizuka
  • Wei Zhan

Motion planning under uncertainty is of significant importance for safety-critical systems such as autonomous vehicles. Such systems have to satisfy necessary constraints (e. g. , collision avoidance) with potential uncertainties coming from either disturbed system dynamics or noisy sensor measurements. However, existing motion planning methods cannot efficiently find the robust optimal solutions under general nonlinear and non-convex settings. In this paper, we formulate such problem as chance-constrained Gaussian belief space planning and propose the constrained iterative Linear Quadratic Gaussian (CILQG) algorithm as a real-time solution. In this algorithm, we iteratively calculate a Gaussian approximation of the belief and transform the chance-constraints. We evaluate the effectiveness of our method in simulations of autonomous driving planning tasks with static and dynamic obstacles. Results show that CILQG can handle uncertainties more appropriately and has faster computation time than baseline methods.

IROS Conference 2021 Conference Paper

Diverse Critical Interaction Generation for Planning and Planner Evaluation

  • Zhao-Heng Yin
  • Lingfeng Sun
  • Liting Sun
  • Masayoshi Tomizuka
  • Wei Zhan

Generating diverse and comprehensive interacting agents to evaluate the decision-making modules is essential for the safe and robust planning of autonomous vehicles (AV). Due to efficiency and safety concerns, most researchers choose to train interactive adversary (competitive or weakly competitive) agents in simulators and generate test cases to interact with evaluated AVs. However, most existing methods fail to provide both natural and critical interaction behaviors in various traffic scenarios. To tackle this problem, we propose a styled generative model RouteGAN that generates diverse interactions by controlling the vehicles separately with desired styles. By altering its style coefficients, the model can generate trajectories with different safety levels serve as an online planner. Experiments show that our model can generate diverse interactions in various scenarios. We evaluate different planners with our model by testing their collision rate in interaction with RouteGAN planners of multiple critical levels.

NeurIPS Conference 2021 Conference Paper

Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling

  • Chen Tang
  • Wei Zhan
  • Masayoshi Tomizuka

Multi-agent behavior modeling and trajectory forecasting are crucial for the safe navigation of autonomous agents in interactive scenarios. Variational Autoencoder (VAE) has been widely applied in multi-agent interaction modeling to generate diverse behavior and learn a low-dimensional representation for interacting systems. However, existing literature did not formally discuss if a VAE-based model can properly encode interaction into its latent space. In this work, we argue that one of the typical formulations of VAEs in multi-agent modeling suffers from an issue we refer to as social posterior collapse, i. e. , the model is prone to ignoring historical social context when predicting the future trajectory of an agent. It could cause significant prediction errors and poor generalization performance. We analyze the reason behind this under-explored phenomenon and propose several measures to tackle it. Afterward, we implement the proposed framework and experiment on real-world datasets for multi-agent trajectory prediction. In particular, we propose a novel sparse graph attention message-passing (sparse-GAMP) layer, which helps us detect social posterior collapse in our experiments. In the experiments, we verify that social posterior collapse indeed occurs. Also, the proposed measures are effective in alleviating the issue. As a result, the model attains better generalization performance when historical social context is informative for prediction.

ICRA Conference 2021 Conference Paper

Prediction-Based Reachability for Collision Avoidance in Autonomous Driving

  • Anjian Li
  • Liting Sun
  • Wei Zhan
  • Masayoshi Tomizuka
  • Mo Chen 0001

Safety is an important topic in autonomous driving since any collision may cause serious injury to people and damage to property. Hamilton-Jacobi (HJ) Reachability is a formal method that verifies safety in multi-agent interaction and provides a safety controller for collision avoidance. However, due to the worst-case assumption on the car’s future behaviours, reachability might result in too much conservatism such that the normal operation of the vehicle is badly hindered. In this paper, we leverage the power of trajectory prediction and propose a prediction-based reachability framework to compute safety controllers. Instead of always assuming the worst case, we cluster the car’s behaviors into multiple driving modes, e. g. left turn or right turn. Under each mode, a reachability-based safety controller is designed based on a less conservative action set. For online implementation, we first utilize the trajectory prediction and our proposed mode classifier to predict the possible modes, and then deploy the corresponding safety controller. Through simulations in a T-intersection and an 8-way roundabout, we demonstrate that our prediction-based reachability method largely avoids collision between two interacting cars and reduces the conservatism that the safety controller brings to the car’s original operation.

IROS Conference 2021 Conference Paper

You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module

  • Chenfeng Xu
  • Bohan Zhai
  • Bichen Wu
  • Tian Li
  • Wei Zhan
  • Peter Vajda
  • Kurt Keutzer
  • Masayoshi Tomizuka

3D perception on point-cloud is a challenging and crucial computer vision task. A point-cloud consists of a sparse, unstructured, and unordered set of points. To understand a point-cloud, previous point-based methods, such as PointNet++, extract visual features through the hierarchical aggregation of local features. However, such methods have several critical limitations: 1) They require considerable sampling and grouping operations, which leads to low inference speed. 2) Despite redundancy among adjacent points, they treat all points alike with an equal amount of computation. 3) They aggregate local features together through downsampling, which causes information loss and hurts perception capability. To overcome these challenges, we propose a novel, simple, and elegant deep learning model called YOGO (You Only Group Once). YOGO divides a point-cloud into a small number of parts and extracts a high-dimensional token to represent points within each sub-region. Next, we use self-attention to capture token-to-token relations, and project the token features back to the point features. We formulate such a series of operations as a relation inference module (RIM). Compared with previous methods, YOGO is very efficient because it only needs to sample and group a point-cloud once. Instead of operating on points, YOGO operates on a small number of tokens, each of which summarizes the point features in a sub-region. This allows us to avoid redundant computation and thus boosts efficiency. Moreover, YOGO preserves pointwise features by projecting token features to point features although the RIM computes on tokens. This avoids information loss and enhances point-wise perception capability. We conduct thorough experiments to demonstrate that YOGO achieves at least 3. 0x speedup over point-based baselines while delivering competitive classification and segmentation performance on a classification dataset and a segmentation dataset based on 3D Wharehouse, and S3DIS datasets. The code is available at https://github.com/chenfengxu714/YOGO.git.

IROS Conference 2020 Conference Paper

A Game-Theoretic Strategy-Aware Interaction Algorithm with Validation on Real Traffic Data

  • Liting Sun
  • Mu Cai
  • Wei Zhan
  • Masayoshi Tomizuka

Interactive decision-making and motion planning are important to safety-critical autonomous agents, particularly when they interact with humans. Many different interaction strategies can be exploited by humans. For instance, they might ignore the autonomous agents, or might behave as selfish optimizers by treating the autonomous agents as opponents, or might assume themselves as leaders and the autonomous agents as followers who should take responsive actions. Different interaction strategies can lead to quite different closed-loop dynamics, and misalignment between the human's policy and the autonomous agent's belief over the policy will severely impact both safety and efficiency. Moreover, a human's interaction policy can change as interaction goes on. Hence, autonomous agents need to be aware of such uncertainties on the human policy, and integrate such information into their decision-making and motion planning algorithms. In this paper, we propose a policy-aware interaction strategy based on game theory. The goal is to allow autonomous agents to estimate humans' interactive policies and respond consequently. We validate the proposed algorithm with a roundabout scenario with real traffic data. The results show that the proposed algorithm can yield trajectories that are more similar to the ground truth than those with fixed policies. Also, we estimate how humans adjust their interaction strategies statistically based on the proposed algorithm.

ICRA Conference 2020 Conference Paper

Analyzing the Suitability of Cost Functions for Explaining and Imitating Human Driving Behavior based on Inverse Reinforcement Learning

  • Maximilian Naumann
  • Liting Sun
  • Wei Zhan
  • Masayoshi Tomizuka

Autonomous vehicles are sharing the road with human drivers. In order to facilitate interactive driving and cooperative behavior in dense traffic, a thorough understanding and representation of other traffic participants' behavior are necessary. Cost functions (or reward functions) have been widely used to describe the behavior of human drivers since they can not only explicitly incorporate the rationality of human drivers and the theory of mind (TOM), but also share similarity with the motion planning problem of autonomous vehicles. Hence, more human-like driving behavior and comprehensible trajectories can be generated to enable safer interaction and cooperation. However, the selection of cost functions in different driving scenarios is not trivial, and there is no systematic summary and analysis for cost function selection and learning from a variety of driving scenarios. In this work, we aim to investigate to what extent cost functions are suitable for explaining and imitating human driving behavior. Further, we focus on how cost functions differ from each other in different driving scenarios. Towards this goal, we first comprehensively review existing cost function structures in literature. Based on that, we point out required conditions for demonstrations to be suitable for inverse reinforcement learning (IRL). Finally, we use IRL to explore suitable features and learn cost function weights from human driven trajectories in three different scenarios.

IROS Conference 2020 Conference Paper

Inferring Spatial Uncertainty in Object Detection

  • Zining Wang
  • Di Feng
  • Yiyang Zhou
  • Lars Rosenbaum
  • Fabian Timm
  • Klaus Dietmayer
  • Masayoshi Tomizuka
  • Wei Zhan

The availability of real-world datasets is the prerequisite for developing object detection methods for autonomous driving. While ambiguity exists in object labels due to error-prone annotation process or sensor observation noises, current object detection datasets only provide deterministic annotations without considering their uncertainty. This precludes an in-depth evaluation among different object detection methods, especially for those that explicitly model predictive probability. In this work, we propose a generative model to estimate bounding box label uncertainties from LiDAR point clouds, and define a new representation of the probabilistic bounding box through spatial distribution. Comprehensive experiments show that the proposed model represents uncertainties commonly seen in driving scenarios. Based on the spatial distribution, we further propose an extension of IoU, called the Jaccard IoU (JIoU), as a new evaluation metric that incorporates label uncertainty. Experiments on the KITTI and the Waymo Open Datasets show that JIoU is superior to IoU when evaluating probabilistic object detectors.

ICRA Conference 2020 Conference Paper

UrbanLoco: A Full Sensor Suite Dataset for Mapping and Localization in Urban Scenes

  • Weisong Wen
  • Yiyang Zhou
  • Guohao Zhang
  • Saman Fahandezh-Saadi
  • Xiwei Bai
  • Wei Zhan
  • Masayoshi Tomizuka
  • Li-Ta Hsu

Mapping and localization is a critical module of autonomous driving, and significant achievements have been reached in this field. Beyond Global Navigation Satellite System (GNSS), research in point cloud registration, visual feature matching, and inertia navigation has greatly enhanced the accuracy and robustness of mapping and localization in different scenarios. However, highly urbanized scenes are still challenging: LIDAR- and camera-based methods perform poorly with numerous dynamic objects; the GNSS-based solutions experience signal loss and multi-path problems; the inertia measurement units (IMU) suffer from drifting. Unfortunately, current public datasets either do not adequately address this urban challenge or do not provide enough sensor information related to map-ping and localization. Here we present UrbanLoco: a mapping/localization dataset collected in highly-urbanized environments with a full sensor-suite. The dataset includes 13 trajectories collected in San Francisco and Hong Kong, covering a total length of over 40 kilometers. Our dataset includes a wide variety of urban terrains: urban canyons, bridges, tunnels, sharp turns, etc. More importantly, our dataset includes information from LIDAR, cameras, IMU, and GNSS receivers. Now the dataset is publicly available through the link in the footnote 1.

IROS Conference 2019 Conference Paper

Constructing a Highly Interactive Vehicle Motion Dataset

  • Wei Zhan
  • Liting Sun
  • Di Wang 0028
  • Yinghan Jin
  • Masayoshi Tomizuka

Research in the areas related to driving behavior, e. g. , behavior modeling and prediction, requires datasets with highly interactive vehicle motions. Existing public vehicle motion datasets emphasize increasing the number of vehicles and time duration, but behavior-related researchers are suffering from two factors. First, strong interactions among vehicles are not well addressed and datasets are of relatively low-density to observe meaningful interactions. Second, most of the existing datasets are missing the map information with reference paths which is essential for driving-behavior-related research. To address this issue, a dataset with highly interactive vehicle motions is constructed in this paper. A variety of challenging driving scenarios such as unsignalized intersections and roundabouts are included. Reference paths are also constructed from motion data along with high-definition maps so that key features can be generated for both prediction and planning algorithms. Moreover, we propose a set of metrics to extract the interactive motions in different maps, including the minimum difference of time to collision point (MDTTC) and duration of waiting period. Such metrics are used to quantity the interaction density of the dataset. We also give several representative results on prediction and motion generation utilizing the constructed dataset to demonstrate how the dataset can facilitate research in the area of driving behavior.

IROS Conference 2019 Conference Paper

Precise Correntropy-based 3D Object Modelling With Geometrical Traffic Prior

  • Di Wang 0028
  • Jianru Xue
  • Wei Zhan
  • Yinghan Jin
  • Nanning Zheng 0001
  • Masayoshi Tomizuka

Robust 3D perception using LiDAR is of prime importance for robotics, and its fundamental core lies in precise object modelling resisting to noise and outliers. In this paper, a precise 3D object modelling algorithm is designed especially for the intelligent vehicles. The proposed algorithm is advantageous by leveraging the crucial traffic geometrical prior of road surface profile, and both the noise and outliers are elegantly handled by robust correntropy-based metric. More specifically, the road surface correction (RSC) method transforms each individual LiDAR measurement from its locally planar road surface to a globally ideal plane. This procedure essentially guarantees the reduction of vehicle's motion from arbitrary 3D motion to physically feasible 2D motion. To deal with the noise and outliers, a correntropy-based multi-frame matching (CorrMM) algorithm is proposed which has a robust objective function with respect to point-to-plane residual error. An efficient solver inspired by M-estimator and retraction technique on Lie group is developed, which elegantly converts the optimization of highly non-linear objective function into a simple quadratic programming (QP) problem. Extensive experimental results validate that the proposed algorithm attains more crisper 3D object models than several state-of-the-art algorithms on a challenging real traffic dataset.

IROS Conference 2018 Conference Paper

Courteous Autonomous Cars

  • Liting Sun
  • Wei Zhan
  • Masayoshi Tomizuka
  • Anca D. Dragan

Typically, autonomous cars optimize for a combination of safety, efficiency, and driving quality. But as we get better at this optimization, we start seeing behavior go from too conservative to too aggressive. The car's behavior exposes the incentives we provide in its cost function. In this work, we argue for cars that are not optimizing a purely selfish cost, but also try to be courteous to other interactive drivers. We formalize courtesy as a term in the objective that measures the increase in another driver's cost induced by the autonomous car's behavior. Such a courtesy term enables the robot car to be aware of possible irrationality of the human behavior, and plan accordingly. We analyze the effect of courtesy in a variety of scenarios. We find, for example, that courteous robot cars leave more space when merging in front of a human driver. Moreover, we find that such a courtesy term can help explain real human driver behavior on the NGSIM dataset.

AAMAS Conference 2017 Conference Paper

Stability of Generalized Two-sided Markets with Transaction Thresholds

  • Zhiyuan Li
  • Yicheng Liu
  • Pingzhong Tang
  • Tingting Xu
  • Wei Zhan

We study a type of generalized two-sided markets where a set of sellers, each with multiple units of the same divisible good, trade with a set of buyers. Possible trade of each unit between a buyer and a seller generates a given welfare (to be split among them), indicated by the weight of the edge between them. What makes the markets interesting is a special type of constraints, called transaction threshold constraints, which essentially mean that the amount of goods traded between a pair of agents can either be zero or above a certain edge-specific threshold. This constraints has originally been motivated from the water-right market domain by Liu et. al. where minimum thresholds must be imposed to mitigate administrative and other costs. The same constraints have been witnessed in several other market domains. Without the threshold constraints, it is known that the seminal result by Shapley and Shubick holds: the social welfare maximizing assignments between buyers and sellers are in the core. In other words, by algorithmically optimizing the market, one can obtain desirable incentive properties for free. This is no longer the case for markets with threshold constraints: the model considered in this paper. We first demonstrate a counterexample where no optimal assignment (with respect to any way to split the trade welfare) is in the core. Motivated by this observation, we study the stability of the optimal assignments from the following two perspectives: 1) by relaxing the definition of core; 2) by restricting the graph structure. For the first line, we show that the optimal assignments are pairwise stable, and no coalition can benefit twice as large when they deviate. For the second line, we exactly characterize the graph structure ∗ This work was supported in part by the National Natural Science Foundation of China Grant 61303077, 61561146398, a Tsinghua University Initiative Scientific Research Grant and a China National Youth 1000-talent program. Appears in: Proc. of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), S. Das, E. Durfee, K. Larson, M. Winikoff (eds.), May 8–12, 2017, São Paulo, Brazil. Copyright c 2017, International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved. for the nonemptyness of core: the core is nonempty if and only if the market graph is a tree. Last but not least, we compliment our previous results by quantitatively measuring the welfare loss caused by the threshold constraints: the optimal welfare without transaction thresholds is no greater than constant times of that with transaction thresholds. We evaluate and confirm our theoretical results using real data from a water-right market.