Author name cluster

Hai Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

AAAI Conference 2026 Conference Paper

OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors Reasoning

Xusheng Guo
Wanfa Zhang
Shijia Zhao
Qiming Xia
Xiaolong Xie
Mingming Wang
Hai Wu
Chenglu Wen

Unsupervised 3D object detection leverages heuristic algorithms to discover potential objects, offering a promising route to reduce annotation costs in autonomous driving. Existing approaches mainly generate pseudo labels and refine them through self-training iterations. However, these pseudo-labels are often incorrect at the beginning of training, resulting in misleading the optimization process. Moreover, effectively filtering and refining them remains a critical challenge. In this paper, we propose $\textbf{OWL}$ for unsupervised 3D object detection by occupancy guided warm-up and large-model priors reasoning. OWL first employs an Occupancy Guided Warm-up (OGW) strategy to initialize the backbone weight with spatial perception capabilities, mitigating the interference of incorrect pseudo-labels on network convergence. Furthermore, OWL introduces an Instance-Cued Reasoning (ICR) module that leverages the prior knowledge of large models to assess pseudo-label quality, enabling precise filtering and refinement. Finally, we design a WAS (Weight-adapted Self-training) strategy to dynamically re-weight pseudo-labels, improving the performance through self-training. Extensive experiments on Waymo Open Dataset (WOD) and KITTI demonstrate that OWL outperforms state-of-the-art unsupervised methods by over 15.0\% mAP, revealing the effectiveness of our method.

PDF Details DOI

AAAI Conference 2025 Conference Paper

L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

Xun Huang
Ziyu Xu
Hai Wu
Jinlong Wang
Qiming Xia
Yan Xia
Jonathan Li
Kyle Gao

LiDAR-based 3D object detection is crucial for autonomous driving. However, due to the quality deterioration of LiDAR point clouds, it suffers from performance degradation in adverse weather conditions. Fusing LiDAR with the weatherrobust 4D radar sensor is expected to solve this problem; however, it faces challenges of significant differences in terms of data quality and the degree of degradation in adverse weather. To address these issues, we introduce L4DR, a weather-robust 3D object detection method that effectively achieves LiDAR and 4D Radar fusion. Our L4DR proposes Multi-Modal Encoding (MME) and Foreground-Aware Denoising (FAD) modules to reconcile sensor gaps, which is the first exploration of the complementarity of early fusion between LiDAR and 4D radar. Additionally, we design an Inter-Modal and IntraModal ({IM}2) parallel feature extraction backbone coupled with a Multi-Scale Gated Fusion (MSGF) module to counteract the varying degrees of sensor degradation under adverse weather conditions. Experimental evaluation on a VoD dataset with simulated fog proves that L4DR is more adaptable to changing weather conditions. It delivers a significant performance increase under different fog levels, improving the 3D mAP by up to 20.0% over the traditional LiDAR-only approach. Moreover, the results on the K-Radar dataset validate the consistent performance improvement of L4DR in realworld adverse weather conditions.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision

Maoji Zheng
Ziyu Xu
Qiming Xia
Hai Wu
Chenglu Wen
Cheng Wang

LIDAR-based 3D object detection and semantic segmentation are critical tasks in 3D scene understanding. Traditional detection and segmentation methods supervise their models through bounding box labels and semantic mask labels. However, these two independent labels inherently contain significant redundancy. This paper aims to eliminate the redundancy by supervising 3D object detection using only semantic labels. However, the challenge arises due to the incomplete geometry structure and boundary ambiguity of point cloud instances, leading to inaccurate pseudo-labels and poor detection results. To address these challenges, we propose a novel method, named Seg2Box. We first introduce a Multi-Frame Multi-Scale Clustering (MFMS-C) module, which leverages the spatio-temporal consistency of point clouds to generate accurate box-level pseudo-labels. Additionally, the Semantic-Guiding Iterative-Mining Self-Training (SGIM-ST) module is proposed to enhance the performance by progressively refining the pseudo-labels and mining the instances without generating pseudo-labels. Experiments on the Waymo Open Dataset and nuScenes Dataset show that our method significantly outperforms other competitive methods by 23.7% and 10.3% in mAP, respectively. The results demonstrate the great label-efficient potential and advancement of our method.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection

Xun Huang
Hai Wu
Xin Li
Xiaoliang Fan
Chenglu Wen
Cheng Wang

LiDAR-based 3D object detection models inevitably struggle under rainy conditions due to the degraded and noisy scanning signals. Previous research has attempted to address this by simulating the noise from rain to improve the robustness of detection models. However, significant disparities exist between simulated and actual rain-impacted data points. In this work, we propose a novel rain simulation method, termed DRET, that unifies Dynamics and Rainy Environment Theory to provide a cost-effective means of expanding the available realistic rain data for 3D detection training. Furthermore, we present a Sunny-to-Rainy Knowledge Distillation (SRKD) approach to enhance 3D detection under rainy conditions. Extensive experiments on the Waymo-Open-Dataset show that, when combined with the state-of-the-art DSVT model and other classical 3D detectors, our proposed framework demonstrates significant detection accuracy improvements, without losing efficiency. Remarkably, our framework also improves detection capabilities under sunny conditions, therefore offering a robust solution for 3D detection regardless of whether the weather is rainy or sunny.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Transformation-Equivariant 3D Object Detection for Autonomous Driving

Hai Wu
Chenglu Wen
Wei Li
Xin Li
Ruigang Yang
Cheng Wang

3D object detection received increasing attention in autonomous driving recently. Objects in 3D scenes are distributed with diverse orientations. Ordinary detectors do not explicitly model the variations of rotation and reflection transformations. Consequently, large networks and extensive data augmentation are required for robust detection. Recent equivariant networks explicitly model the transformation variations by applying shared networks on multiple transformed point clouds, showing great potential in object geometry modeling. However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed. In this work, we present TED, an efficient Transformation-Equivariant 3D Detector to overcome the computation cost and speed issues. TED first applies a sparse convolution backbone to extract multi-channel transformation-equivariant voxel features; and then aligns and aggregates these equivariant features into lightweight and compact representations for high-performance 3D object detection. On the highly competitive KITTI 3D car detection leaderboard, TED ranked 1st among all submissions with competitive efficiency. Code is available at https://github.com/hailanyi/TED.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Tracklet Proposal Network for Multi-Object Tracking on Point Clouds

Hai Wu
Qing Li
Chenglu Wen
Xin Li
Xiaoliang Fan
Cheng Wang

This paper proposes the first tracklet proposal network, named PC-TCNN, for Multi-Object Tracking (MOT) on point clouds. Our pipeline first generates tracklet proposals, then refines these tracklets and associates them to generate long trajectories. Specifically, object proposal generation and motion regression are first performed on a point cloud sequence to generate tracklet candidates. Then, spatial-temporal features of each tracklet are exploited and their consistency is used to refine the tracklet proposal. Finally, the refined tracklets across multiple frames are associated to perform MOT on the point cloud sequence. The PC-TCNN significantly improves the MOT performance by introducing the tracklet proposal design. On the KITTI tracking benchmark, it attains an MOTA of 91. 75%, outperforming all submitted results on the online leaderboard.

PDF Details DOI

AAAI Conference 2020 Conference Paper

CircleNet for Hip Landmark Detection

Hai Wu
Hongtao Xie
Chuanbin Liu
Zheng-Jun Zha
Jun Sun
Yongdong Zhang

Landmark detection plays a critical role in diagnosis of Developmental Dysplasia of the Hip (DDH). Heatmap and anchor-based object detection techniques could obtain reasonable results. However, they have limitations in both robustness and precision given the complexities and inhomogeneity of hip X-ray images. In this paper, we propose a much simpler and more efﬁcient framework called CircleNet to improve the accuracy of landmark detection by predicting landmark and corresponding radius. Using the CircleNet, we not only constrain the relationship between landmarks but also integrate landmark detection and object detection into an end-to-end framework. In order to capture the effective information of the long-range dependency of landmarks in the DDH image, here we propose a new context modeling framework, named the Local Non-Local (LNL) block. The LNL block has the beneﬁts of both non-local block and lightweight computation. We construct a professional DDH dataset for the ﬁrst time and evaluate our CircleNet on it. The dataset has the largest number of DDH X-ray images in the world to our knowledge. Our results show that the CircleNet can achieve the state-of-the-art results for landmark detection on the dataset with a large margin of 1. 8 average pixels compared to current methods. The dataset and source code will be publicly available.

PDF Details