Author name cluster

Chenglu Wen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

AAAI Conference 2026 Conference Paper

GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting

Yuning Peng
Haiping Wang
Yuan Liu
Chenglu Wen
Zhen Dong
Bisheng Yang

3D open-vocabulary scene understanding, which accurately perceives complex semantic properties of objects in space, has gained significant attention in recent years. In this paper, we propose GAGS, a framework that distills 2D CLIP features into 3D Gaussian splatting, enabling open-vocabulary queries for renderings on arbitrary viewpoints. The main challenge of distilling 2D features for 3D fields lies in the multiview inconsistency of extracted 2D features, which provides unstable supervision for the 3D feature field. GAGS addresses this challenge with two novel strategies. First, GAGS associates the prompt point density of SAM with the camera distances to scene objects, which significantly improves the multiview consistency of segmentation results. Second, GAGS further decodes a granularity factor to guide the distillation process and this granularity factor can be learned in a unsupervised manner to only select the multiview consistent 2D features in the distillation process. Experimental results on two datasets show that GAGS improves visual grounding accuracy by an average of 10.9% and semantic segmentation accuracy by an average of 7.0%, with an inference speed 2× faster than baseline methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors Reasoning

Xusheng Guo
Wanfa Zhang
Shijia Zhao
Qiming Xia
Xiaolong Xie
Mingming Wang
Hai Wu
Chenglu Wen

Unsupervised 3D object detection leverages heuristic algorithms to discover potential objects, offering a promising route to reduce annotation costs in autonomous driving. Existing approaches mainly generate pseudo labels and refine them through self-training iterations. However, these pseudo-labels are often incorrect at the beginning of training, resulting in misleading the optimization process. Moreover, effectively filtering and refining them remains a critical challenge. In this paper, we propose $\textbf{OWL}$ for unsupervised 3D object detection by occupancy guided warm-up and large-model priors reasoning. OWL first employs an Occupancy Guided Warm-up (OGW) strategy to initialize the backbone weight with spatial perception capabilities, mitigating the interference of incorrect pseudo-labels on network convergence. Furthermore, OWL introduces an Instance-Cued Reasoning (ICR) module that leverages the prior knowledge of large models to assess pseudo-label quality, enabling precise filtering and refinement. Finally, we design a WAS (Weight-adapted Self-training) strategy to dynamically re-weight pseudo-labels, improving the performance through self-training. Extensive experiments on Waymo Open Dataset (WOD) and KITTI demonstrate that OWL outperforms state-of-the-art unsupervised methods by over 15.0\% mAP, revealing the effectiveness of our method.

PDF Details DOI

AAAI Conference 2026 Conference Paper

V2VLoc: Robust GNSS-Free Collaborative Perception via LiDAR Localization

Wenkai Lin
Qiming Xia
Wen Li
Xun Huang
Chenglu Wen

Multi-agents rely on accurate poses to share and align observations, enabling a collaborative perception of the environment. However, traditional GNSS-based localization often fails in GNSS-denied environments, making consistent feature alignment difficult in collaboration. To tackle this challenge, we propose a robust GNSS-free collaborative perception framework based on LiDAR localization. Specifically, we propose a lightweight Pose Generator with Confidence (PGC) to estimate compact pose and confidence representations. To alleviate the effects of localization errors, we further develop the Pose-Aware Spatio-Temporal Alignment Transformer (PASTAT), which performs confidence-aware spatial alignment while capturing essential temporal context. Additionally, we present a new simulation dataset, V2VLoc, which can be adapted for both LiDAR localization and collaborative detection tasks. V2VLoc comprises three subsets: Town1Loc, Town4Loc, and V2VDet. Town1Loc and Town4Loc offer multi-traversal sequences for training in localization tasks, whereas V2VDet is specifically intended for the collaborative detection task. Extensive experiments conducted on the V2VLoc dataset demonstrate that our approach achieves state-of-the-art performance under GNSS-denied conditions. We further conduct extended experiments on the real-world V2V4Real dataset to validate the effectiveness and generalizability of PASTAT.

PDF Details DOI

AAAI Conference 2026 Conference Paper

WinMamba: Multi-Scale Shifted Windows in State Space Model for 3D Object Detection

Longhui Zheng
Qiming Xia
Xiaolu Chen
Zhaoliang Liu
Chenglu Wen

3D object detection is critical for autonomous driving, yet it remains fundamentally challenging to simultaneously maximize computational efficiency and capture long-range spatial dependencies.We observed that Mamba-based models, with their linear state-space design, capture long-range dependencies at lower cost, offering a promising balance between efficiency and accuracy.However, existing methods rely on axis-aligned scanning within a fixed window, inevitably discarding spatial information. To address this problem, we propose WinMamba, a novel Mamba-based 3D feature-encoding backbone composed of stacked WinMamba blocks. To enhance the backbone with robust multi-scale representation, the WinMamba block incorporates a window-scale-adaptive module that compensates voxel features across varying resolutions during sampling. Meanwhile, to obtain rich contextual cues within the linear state space, we equip the WinMamba layer with a learnable positional encoding and a window-shift strategy.Extensive experiments on the KITTI and Waymo datasets demonstrate that WinMamba significantly outperforms the baseline. Ablation studies further validate the individual contributions of the WSF and AWF modules in improving detection accuracy. The code will be made publicly available.

PDF Details DOI

AAAI Conference 2025 Conference Paper

AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction

Pufan Zou
Shijia Zhao
Weijie Huang
Qiming Xia
Chenglu Wen
Wei Li
Cheng Wang

Recently, Visual Foundation Models (VFMs) have shown a remarkable generalization performance in 3D perception tasks. However, their effectiveness in large-scale outdoor datasets remains constrained by the scarcity of accurate supervision signals, the extensive noise caused by variable outdoor conditions, and the abundance of unknown objects. In this work, we propose a novel label-free learning method, Adaptive Label Correction (AdaCo), for 3D semantic segmentation. AdaCo first introduces the Cross-modal Label Generation Module (CLGM), providing cross-modal supervision with the formidable interpretive capabilities of the VFMs. Subsequently, AdaCo incorporates the Adaptive Noise Corrector (ANC), updating and adjusting the noisy samples within this supervision iteratively during training. Moreover, we develop an Adaptive Robust Loss (ARL) function to modulate each sample's sensitivity to noisy supervision, preventing potential underfitting issues associated with robust loss. Our proposed AdaCo can effectively mitigate the performance limitations of label-free learning networks in 3D semantic segmentation tasks. Extensive experiments on two outdoor benchmark datasets highlight the superior performance of our method.

PDF Details DOI

ICLR Conference 2025 Conference Paper

DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning

Chao Li
Ziwei Deng
Chenxing Lin
Wenqi Chen
Yongquan Fu
Weiquan Liu
Chenglu Wen
Cheng Wang 0003

Diffusion models have been widely adopted in image and language generation and are now being applied to reinforcement learning. However, the application of diffusion models in offline cooperative Multi-Agent Reinforcement Learning (MARL) remains limited. Although existing studies explore this direction, they suffer from scalability or poor cooperation issues due to the lack of design principles for diffusion-based MARL. The Individual-Global-Max (IGM) principle is a popular design principle for cooperative MARL. By satisfying this principle, MARL algorithms achieve remarkable performance with good scalability. In this work, we extend the IGM principle to the Individual-Global-identically-Distributed (IGD) principle. This principle stipulates that the generated outcome of a multi-agent diffusion model should be identically distributed as the collective outcomes from multiple individual-agent diffusion models. We propose DoF, a diffusion factorization framework for Offline MARL. It uses noise factorization function to factorize a centralized diffusion model into multiple diffusion models. We theoretically show that the noise factorization functions satisfy the IGD principle. Furthermore, DoF uses data factorization function to model the complex relationship among data generated by multiple diffusion models. Through extensive experiments, we demonstrate the effectiveness of DoF. The source code is available at [https://github.com/xmu-rl-3dv/DoF](https://github.com/xmu-rl-3dv/DoF).

Details

AAAI Conference 2025 Conference Paper

L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

Xun Huang
Ziyu Xu
Hai Wu
Jinlong Wang
Qiming Xia
Yan Xia
Jonathan Li
Kyle Gao

LiDAR-based 3D object detection is crucial for autonomous driving. However, due to the quality deterioration of LiDAR point clouds, it suffers from performance degradation in adverse weather conditions. Fusing LiDAR with the weatherrobust 4D radar sensor is expected to solve this problem; however, it faces challenges of significant differences in terms of data quality and the degree of degradation in adverse weather. To address these issues, we introduce L4DR, a weather-robust 3D object detection method that effectively achieves LiDAR and 4D Radar fusion. Our L4DR proposes Multi-Modal Encoding (MME) and Foreground-Aware Denoising (FAD) modules to reconcile sensor gaps, which is the first exploration of the complementarity of early fusion between LiDAR and 4D radar. Additionally, we design an Inter-Modal and IntraModal ({IM}2) parallel feature extraction backbone coupled with a Multi-Scale Gated Fusion (MSGF) module to counteract the varying degrees of sensor degradation under adverse weather conditions. Experimental evaluation on a VoD dataset with simulated fog proves that L4DR is more adaptable to changing weather conditions. It delivers a significant performance increase under different fog levels, improving the 3D mAP by up to 20.0% over the traditional LiDAR-only approach. Moreover, the results on the K-Radar dataset validate the consistent performance improvement of L4DR in realworld adverse weather conditions.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision

Maoji Zheng
Ziyu Xu
Qiming Xia
Hai Wu
Chenglu Wen
Cheng Wang

LIDAR-based 3D object detection and semantic segmentation are critical tasks in 3D scene understanding. Traditional detection and segmentation methods supervise their models through bounding box labels and semantic mask labels. However, these two independent labels inherently contain significant redundancy. This paper aims to eliminate the redundancy by supervising 3D object detection using only semantic labels. However, the challenge arises due to the incomplete geometry structure and boundary ambiguity of point cloud instances, leading to inaccurate pseudo-labels and poor detection results. To address these challenges, we propose a novel method, named Seg2Box. We first introduce a Multi-Frame Multi-Scale Clustering (MFMS-C) module, which leverages the spatio-temporal consistency of point clouds to generate accurate box-level pseudo-labels. Additionally, the Semantic-Guiding Iterative-Mining Self-Training (SGIM-ST) module is proposed to enhance the performance by progressively refining the pseudo-labels and mining the instances without generating pseudo-labels. Experiments on the Waymo Open Dataset and nuScenes Dataset show that our method significantly outperforms other competitive methods by 23.7% and 10.3% in mAP, respectively. The results demonstrate the great label-efficient potential and advancement of our method.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Federated Graph Learning for Cross-Domain Recommendation

Ziqi Yang
Zhaopeng Peng
Zihui Wang
Jianzhong Qi
Chaochao Chen
Weike Pan
Chenglu Wen
Cheng Wang

Cross-domain recommendation (CDR) offers a promising solution to the data sparsity problem by enabling knowledge transfer across source and target domains. However, many recent CDR models overlook crucial issues such as privacy as well as the risk of negative transfer (which negatively impact model performance), especially in multi-domain settings. To address these challenges, we propose FedGCDR, a novel federated graph learning framework that securely and effectively leverages positive knowledge from multiple source domains. First, we design a positive knowledge transfer module that ensures privacy during inter-domain knowledge transmission. This module employs differential privacy-based knowledge extraction combined with a feature mapping mechanism, transforming source domain embeddings from federated graph attention networks into reliable domain knowledge. Second, we design a knowledge activation module to filter out potential harmful or conflicting knowledge from source domains, addressing the issues of negative transfer. This module enhances target domain training by expanding the graph of the target domain to generate reliable domain attentions and fine-tunes the target model for improved negative knowledge filtering and more accurate predictions. We conduct extensive experiments on 16 popular domains of the Amazon dataset, demonstrating that FedGCDR significantly outperforms state-of-the-art methods.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

FedPFT: Federated Proxy Fine-Tuning of Foundation Models

Zhaopeng Peng
Xiaoliang Fan
Yufan Chen
Zheng Wang
Shirui Pan
Chenglu Wen
Ruisheng Zhang
Cheng Wang

Adapting Foundation Models (FMs) for down- stream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine- tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumula- tions of gradients. In this paper, we propose Feder- ated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise com- pression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations—layer- level and neuron-level—before and during FL fine- tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theo- retical guarantees. Experimental results on seven commonly used datasets (i. e. , four text and three vi- sion) demonstrate the superiority of FedPFT. Our code is available at https: //github. com/pzp-dzd/FedPFT.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration

Kezheng Xiong
Haoen Xiang
Qingshan Xu
Chenglu Wen
Siqi Shen
Jonathan Li
Cheng Wang

Point cloud registration, a fundamental task in 3D vision, has achieved remarkable success with learning-based methods in outdoor environments. Unsupervised outdoor point cloud registration methods have recently emerged to circumvent the need for costly pose annotations. However, they fail to establish reliable optimization objectives for unsupervised training, either relying on overly strong geometric assumptions, or suffering from poor-quality pseudo-labels due to inadequate integration of low-level geometric and high-level contextual information. We have observed that in the feature space, latent new inlier correspondences tend to clusteraround respective positive anchors that summarize features of existing inliers. Motivated by this observation, we propose a novel unsupervised registration method termed INTEGER to incorporate high-level contextual information for reliable pseudo-label mining. Specifically, we propose the Feature-Geometry Coherence Mining module to dynamically adapt the teacher for each mini-batch of data during training and discover reliable pseudo-labels by considering both high-level feature representations and low-level geometric cues. Furthermore, we propose Anchor-Based Contrastive Learning to facilitate contrastive learning with anchors for a robust feature space. Lastly, we introduce a Mixed-Density Student to learn density-invariant features, addressing challenges related to density variation and low overlap in the outdoor scenario. Extensive experiments on KITTI and nuScenes datasets demonstrate that our INTEGER achieves competitive performance in terms of accuracy and generalizability.

PDF Details DOI

AAAI Conference 2024 Conference Paper

SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration

Kezheng Xiong
Maoji Zheng
Qingshan Xu
Chenglu Wen
Siqi Shen
Cheng Wang

Point cloud registration, a fundamental task in 3D computer vision, has remained largely unexplored in cross-source point clouds and unstructured scenes. The primary challenges arise from noise, outliers, and variations in scale and density. However, neglected geometric natures of point clouds restricts the performance of current methods. In this paper, we propose a novel method termed SPEAL to leverage skeletal representations for effective learning of intrinsic topologies of point clouds, facilitating robust capture of geometric intricacy. Specifically, we design the Skeleton Extraction Module to extract skeleton points and skeletal features in an unsupervised manner, which is inherently robust to noise and density variances. Then, we propose the Skeleton-Aware GeoTransformer to encode high-level skeleton-aware features. It explicitly captures the topological natures and inter-point-cloud skeletal correlations with the noise-robust and density-invariant skeletal representations. Next, we introduce the Correspondence Dual-Sampler to facilitate correspondences by augmenting the correspondence set with skeletal correspondences. Furthermore, we construct a challenging novel cross-source point cloud dataset named KITTI CrossSource for benchmarking cross-source point cloud registration methods. Extensive quantitative and qualitative experiments are conducted to demonstrate our approach’s superiority and robustness on both cross-source and same-source datasets. To the best of our knowledge, our approach is the first to facilitate point cloud registration with skeletal geometric priors.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection

Xun Huang
Hai Wu
Xin Li
Xiaoliang Fan
Chenglu Wen
Cheng Wang

LiDAR-based 3D object detection models inevitably struggle under rainy conditions due to the degraded and noisy scanning signals. Previous research has attempted to address this by simulating the noise from rain to improve the robustness of detection models. However, significant disparities exist between simulated and actual rain-impacted data points. In this work, we propose a novel rain simulation method, termed DRET, that unifies Dynamics and Rainy Environment Theory to provide a cost-effective means of expanding the available realistic rain data for 3D detection training. Furthermore, we present a Sunny-to-Rainy Knowledge Distillation (SRKD) approach to enhance 3D detection under rainy conditions. Extensive experiments on the Waymo-Open-Dataset show that, when combined with the state-of-the-art DSVT model and other classical 3D detectors, our proposed framework demonstrates significant detection accuracy improvements, without losing efficiency. Remarkably, our framework also improves detection capabilities under sunny conditions, therefore offering a robust solution for 3D detection regardless of whether the weather is rainy or sunny.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Transformation-Equivariant 3D Object Detection for Autonomous Driving

Hai Wu
Chenglu Wen
Wei Li
Xin Li
Ruigang Yang
Cheng Wang

3D object detection received increasing attention in autonomous driving recently. Objects in 3D scenes are distributed with diverse orientations. Ordinary detectors do not explicitly model the variations of rotation and reflection transformations. Consequently, large networks and extensive data augmentation are required for robust detection. Recent equivariant networks explicitly model the transformation variations by applying shared networks on multiple transformed point clouds, showing great potential in object geometry modeling. However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed. In this work, we present TED, an efficient Transformation-Equivariant 3D Detector to overcome the computation cost and speed issues. TED first applies a sparse convolution backbone to extract multi-channel transformation-equivariant voxel features; and then aligns and aggregates these equivariant features into lightweight and compact representations for high-performance 3D object detection. On the highly competitive KITTI 3D car detection leaderboard, TED ranked 1st among all submissions with competitive efficiency. Code is available at https://github.com/hailanyi/TED.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Federated Learning with Fair Averaging

Zheng Wang
Xiaoliang Fan
Jianzhong Qi
Chenglu Wen
Cheng Wang
Rongshan Yu

Fairness has emerged as a critical problem in federated learning (FL). In this work, we identify a cause of unfairness in FL -- conflicting gradients with large differences in the magnitudes. To address this issue, we propose the federated fair averaging (FedFV) algorithm to mitigate potential conflicts among clients before averaging their gradients. We first use the cosine similarity to detect gradient conflicts, and then iteratively eliminate such conflicts by modifying both the direction and the magnitude of the gradients. We further show the theoretical foundation of FedFV to mitigate the issue conflicting gradients and converge to Pareto stationary solutions. Extensive experiments on a suite of federated datasets confirm that FedFV compares favorably against state-of-the-art methods in terms of fairness, accuracy and efficiency. The source code is available at https: //github. com/WwZzz/easyFL.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Tracklet Proposal Network for Multi-Object Tracking on Point Clouds

Hai Wu
Qing Li
Chenglu Wen
Xin Li
Xiaoliang Fan
Cheng Wang

This paper proposes the first tracklet proposal network, named PC-TCNN, for Multi-Object Tracking (MOT) on point clouds. Our pipeline first generates tracklet proposals, then refines these tracklets and associates them to generate long trajectories. Specifically, object proposal generation and motion regression are first performed on a point cloud sequence to generate tracklet candidates. Then, spatial-temporal features of each tracklet are exploited and their consistency is used to refine the tracklet proposal. Finally, the refined tracklets across multiple frames are associated to perform MOT on the point cloud sequence. The PC-TCNN significantly improves the MOT performance by introducing the tracklet proposal design. On the KITTI tracking benchmark, it attains an MOTA of 91. 75%, outperforming all submitted results on the online leaderboard.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Point2Node: Correlation Learning of Dynamic-Node for Point Cloud Feature Modeling

Wenkai Han
Chenglu Wen
Cheng Wang
Xin Li
Qing Li

Fully exploring correlation among points in point clouds is essential for their feature modeling. This paper presents a novel end-to-end graph model, named Point2Node, to represent a given point cloud. Point2Node can dynamically explore correlation among all graph nodes from different levels, and adaptively aggregate the learned features. Speciﬁcally, ﬁrst, to fully explore the spatial correlation among points for enhanced feature description, in a high-dimensional node graph, we dynamically integrate the node’s correlation with self, local, and non-local nodes. Second, to more effectively integrate learned features, we design a data-aware gate mechanism to self-adaptively aggregate features at the channel level. Extensive experiments on various point cloud benchmarks demonstrate that our method outperforms the state-ofthe-art.

PDF Details

IJCAI Conference 2018 Conference Paper

H-Net: Neural Network for Cross-domain Image Patch Matching

Weiquan Liu
Xuelun Shen
Cheng Wang
Zhihong Zhang
Chenglu Wen
Jonathan Li

Describing the same scene with different imaging style or rendering image from its 3D model gives us different domain images. Different domain images tend to have a gap and different local appearances, which raise the main challenge on the cross-domain image patch matching. In this paper, we propose to incorporate AutoEncoder into the Siamese network, named as H-Net, of which the structural shape resembles the letter H. The H-Net achieves state-of-the-art performance on the cross-domain image patch matching. Furthermore, we improved H-Net to H-Net++. The H-Net++ extracts invariant feature descriptors in cross-domain image patches and achieves state-of-the-art performance by feature retrieval in Euclidean space. As there is no benchmark dataset including cross-domain images, we made a cross-domain image dataset which consists of camera images, rendering images from UAV 3D model, and images generated by CycleGAN algorithm. Experiments show that the proposed H-Net and H-Net++ outperform the existing algorithms. Our code and cross-domain image dataset are available at https: //github. com/Xylon-Sean/H-Net.

PDF Details