Author name cluster

Diange Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

COME: Adding Scene-Centric Forecasting Control to Occupancy World Model

Yining Shi
Kun Jiang
Qiang Meng
Ke Wang
Jiabao Wang
Wenchao Sun
Tuopu Wen
MengMeng Yang

World models are critical for autonomous driving to simulate environmental dynamics and generate synthetic data. Existing methods struggle to disentangle ego-vehicle motion (perspective shifts) from scene evolvement (agent interactions), leading to suboptimal predictions. Instead, we propose to separate environmental changes from ego-motion by leveraging the scene-centric coordinate systems. In this paper, we introduce COME: a framework that integrates scene-centric forecasting Control into the Occupancy world ModEl. Specifically, COME first generates ego-irrelevant, spatially consistent future features through a scene-centric prediction branch, which are then converted into scene condition using a tailored ControlNet. These condition features are subsequently injected into the occupancy world model, enabling more accurate and controllable future occupancy predictions. Experimental results on the nuScenes-Occ3D dataset show that COME achieves consistent and significant improvements over state-of-the-art (SOTA) methods across diverse configurations, including different input sources (ground-truth, camera-based, fusion-based occupancy) and prediction horizons (3s and 8s). For example, under the same settings, COME achieves 26. 3% better mIoU metric than DOME and 23. 7% better mIoU metric than UniScene. These results highlight the efficacy of disentangled representation learning in enhancing spatio-temporal prediction fidelity for world models. Code is available at https: //github. com/synsin0/COME.

PDF Details

IROS Conference 2025 Conference Paper

DRARL: Disengagement-Reason-Augmented Reinforcement Learning for Efficient Improvement of Autonomous Driving Policy

Weitao Zhou
Bo Zhang 0106
Zhong Cao
Xiang Li 0001
Qian Cheng
Chunyang Liu
Yaqin Zhang
Diange Yang

With the increasing presence of automated vehicles on open roads under driver supervision, disengagement cases are becoming more prevalent. While some data-driven planning systems attempt to directly utilize these disengagement cases for policy improvement, the inherent scarcity of disengagement data (often occurring as a single instance) restricts training effectiveness. Furthermore, some disengagement data should be excluded since the disengagement may not always come from the failure of driving policies, e. g. the driver may casually intervene for a while. To this end, this work proposes disengagement-reason-augmented reinforcement learning (DRARL), which enhances driving policy improvement process according to the reason of disengagement cases. Specifically, the reason of disengagement is identified by an out-of-distribution (OOD) state estimation model. When the reason doesn’t exist, the case will be identified as a casual disengagement case, which doesn’t require additional policy adjustment. Otherwise, the policy can be updated under a reason-augmented imagination environment, improving the policy performance of disengagement cases with similar reasons. The method is evaluated using real-world disengagement cases collected by autonomous driving robotaxi. Experimental results demonstrate that the method accurately identifies policy-related disengagement reasons, allowing the agent to handle both original and semantically similar cases through reason-augmented training. Furthermore, the approach prevents the agent from becoming overly conservative after policy adjustments. Overall, this work provides an efficient way to improve driving policy performance with disengagement cases.

Details

IROS Conference 2025 Conference Paper

Efficient End-to-end Visual Localization for Autonomous Driving with Decoupled BEV Neural Matching

Jinyu Miao
Tuopu Wen
Ziang Luo
Kangan Qian
Zheng Fu
Yunlong Wang 0009
Kun Jiang 0002
Mengmeng Yang 0001

Accurate localization plays an important role in high-level autonomous driving systems. Conventional map matching-based localization methods solve the poses by explicitly matching map elements with sensor observations, generally sensitive to perception noise, therefore requiring costly hyperparameter tuning. In this paper, we propose an end-to-end localization neural network which directly estimates vehicle poses from surrounding images, without explicitly matching perception results with HD maps. To ensure efficiency and interpretability, a decoupled BEV neural matching-based pose solver is proposed, which estimates poses in a differentiable sampling-based matching module. Moreover, the sampling space is hugely reduced by decoupling the feature representation affected by each DoF of poses. The experimental results demonstrate that the proposed network is capable of performing decimeter level localization with mean absolute errors of 0. 19m, 0. 13m and 0. 39° in longitudinal, lateral position and yaw angle while exhibiting a 68. 8% reduction in inference memory usage.

Details

IROS Conference 2025 Conference Paper

EFFOcc: Learning Efficient Occupancy Networks from Minimal Labels for Autonomous Driving

Yining Shi 0002
Kun Jiang 0002
Jinyu Miao
Ke Wang 0002
Kangan Qian
Yunlong Wang 0009
Jiusi Li
Tuopu Wen

3D occupancy prediction (3DOcc) is a rapidly rising and challenging perception task in the field of autonomous driving. Existing 3D occupancy networks (OccNets) are both computationally heavy and label-hungry. In terms of model complexity, OccNets are commonly composed of heavy Conv3D modules or transformers at the voxel level. Moreover, OccNets are supervised with expensive large-scale dense voxel labels. Model and label inefficiencies, caused by excessive network parameters and label annotation requirements, severely hinder the onboard deployment of OccNets. This paper proposes an EFFicient Occupancy learning framework, EFFOcc, that targets minimal network complexity and label requirements while achieving state-of-the-art accuracy. We first propose an efficient fusion-based OccNet that only uses simple 2D operators and improves accuracy to the state-of-the-art on three large-scale benchmarks: Occ3D-nuScenes, Occ3D-Waymo, and OpenOccupancy-nuScenes. On the Occ3D-nuScenes benchmark, the fusion-based model with ResNet-18 as the image backbone has 21. 35M parameters and achieves 51. 49 in terms of mean Intersection over Union (mIoU). Furthermore, we propose a multi-stage occupancy-oriented distillation to efficiently transfer knowledge to vision-only OccNet. Extensive experiments on occupancy benchmarks show state-of-the-art precision for both fusion-based and vision-based OccNets. For the demonstration of learning with limited labels, we achieve 94. 38% of the performance (mIoU = 28. 38) of a 100% labeled vision OccNet (mIoU = 30. 07) using the same OccNet trained with only 40% labeled sequences and distillation from the fusion-based OccNet. Code is available at https://github.com/synsin0/EFFOcc.

Details

IROS Conference 2025 Conference Paper

LEGO-Motion: Learning-Enhanced Grids with Occupancy Instance Modeling for Class-Agnostic Motion Prediction

Kangan Qian
Jinyu Miao
Ziang Luo
Zheng Fu
Jinchen Li
Yining Shi 0002
Yunlong Wang 0009
Kun Jiang 0002

Accurate spatial and motion understanding is critical for autonomous driving systems. While object-level perception models excel in structured environments, they struggle with open-set categories and often lack precise geometric representation. Occupancy-based, class-agnostic methods offer better scene expressiveness but typically ignore inter-agent interactions and fail to ensure physical consistency in motion predictions, limiting their reliability in complex traffic scenarios. In this paper, we propose LEGO-Motion, a novel class-agnostic motion prediction framework that bridges the gap between instance-level reasoning and occupancy-based modeling. Unlike conventional grid-based methods that treat each cell independently, LEGO-Motion introduces two key components: (1) the Interaction-Augmented Instance Encoder (IaIE), which models interactions among dynamic agents via cross-attention, and (2) the Instance-Enhanced BEV Encoder (IeBE), which improves motion consistency across instances through multi-stage feature fusion. These components enable our model to learn semantically coherent and physically plausible motion fields. Extensive experiments on the nuScenes dataset show that LEGO-Motion achieves a around 6% improvement in motion prediction accuracy over the previous state-of-the-art, while maintaining real-time inference at 21ms. Moreover, our method demonstrates strong generalization on a proprietary FMCW LiDAR benchmark. These results validate LEGO-Motion's effectiveness in capturing both global scene structure and fine-grained motion dynamics, making it a promising foundation for next-generation perception systems.

Details