Arrow Research search

Author name cluster

Chenglu Wen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

AAAI Conference 2026 Conference Paper

GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting

  • Yuning Peng
  • Haiping Wang
  • Yuan Liu
  • Chenglu Wen
  • Zhen Dong
  • Bisheng Yang

3D open-vocabulary scene understanding, which accurately perceives complex semantic properties of objects in space, has gained significant attention in recent years. In this paper, we propose GAGS, a framework that distills 2D CLIP features into 3D Gaussian splatting, enabling open-vocabulary queries for renderings on arbitrary viewpoints. The main challenge of distilling 2D features for 3D fields lies in the multiview inconsistency of extracted 2D features, which provides unstable supervision for the 3D feature field. GAGS addresses this challenge with two novel strategies. First, GAGS associates the prompt point density of SAM with the camera distances to scene objects, which significantly improves the multiview consistency of segmentation results. Second, GAGS further decodes a granularity factor to guide the distillation process and this granularity factor can be learned in a unsupervised manner to only select the multiview consistent 2D features in the distillation process. Experimental results on two datasets show that GAGS improves visual grounding accuracy by an average of 10.9% and semantic segmentation accuracy by an average of 7.0%, with an inference speed 2× faster than baseline methods.

AAAI Conference 2026 Conference Paper

OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors Reasoning

  • Xusheng Guo
  • Wanfa Zhang
  • Shijia Zhao
  • Qiming Xia
  • Xiaolong Xie
  • Mingming Wang
  • Hai Wu
  • Chenglu Wen

Unsupervised 3D object detection leverages heuristic algorithms to discover potential objects, offering a promising route to reduce annotation costs in autonomous driving. Existing approaches mainly generate pseudo labels and refine them through self-training iterations. However, these pseudo-labels are often incorrect at the beginning of training, resulting in misleading the optimization process. Moreover, effectively filtering and refining them remains a critical challenge. In this paper, we propose $\textbf{OWL}$ for unsupervised 3D object detection by occupancy guided warm-up and large-model priors reasoning. OWL first employs an Occupancy Guided Warm-up (OGW) strategy to initialize the backbone weight with spatial perception capabilities, mitigating the interference of incorrect pseudo-labels on network convergence. Furthermore, OWL introduces an Instance-Cued Reasoning (ICR) module that leverages the prior knowledge of large models to assess pseudo-label quality, enabling precise filtering and refinement. Finally, we design a WAS (Weight-adapted Self-training) strategy to dynamically re-weight pseudo-labels, improving the performance through self-training. Extensive experiments on Waymo Open Dataset (WOD) and KITTI demonstrate that OWL outperforms state-of-the-art unsupervised methods by over 15.0\% mAP, revealing the effectiveness of our method.

AAAI Conference 2026 Conference Paper

V2VLoc: Robust GNSS-Free Collaborative Perception via LiDAR Localization

  • Wenkai Lin
  • Qiming Xia
  • Wen Li
  • Xun Huang
  • Chenglu Wen

Multi-agents rely on accurate poses to share and align observations, enabling a collaborative perception of the environment. However, traditional GNSS-based localization often fails in GNSS-denied environments, making consistent feature alignment difficult in collaboration. To tackle this challenge, we propose a robust GNSS-free collaborative perception framework based on LiDAR localization. Specifically, we propose a lightweight Pose Generator with Confidence (PGC) to estimate compact pose and confidence representations. To alleviate the effects of localization errors, we further develop the Pose-Aware Spatio-Temporal Alignment Transformer (PASTAT), which performs confidence-aware spatial alignment while capturing essential temporal context. Additionally, we present a new simulation dataset, V2VLoc, which can be adapted for both LiDAR localization and collaborative detection tasks. V2VLoc comprises three subsets: Town1Loc, Town4Loc, and V2VDet. Town1Loc and Town4Loc offer multi-traversal sequences for training in localization tasks, whereas V2VDet is specifically intended for the collaborative detection task. Extensive experiments conducted on the V2VLoc dataset demonstrate that our approach achieves state-of-the-art performance under GNSS-denied conditions. We further conduct extended experiments on the real-world V2V4Real dataset to validate the effectiveness and generalizability of PASTAT.

AAAI Conference 2026 Conference Paper

WinMamba: Multi-Scale Shifted Windows in State Space Model for 3D Object Detection

  • Longhui Zheng
  • Qiming Xia
  • Xiaolu Chen
  • Zhaoliang Liu
  • Chenglu Wen

3D object detection is critical for autonomous driving, yet it remains fundamentally challenging to simultaneously maximize computational efficiency and capture long-range spatial dependencies.We observed that Mamba-based models, with their linear state-space design, capture long-range dependencies at lower cost, offering a promising balance between efficiency and accuracy.However, existing methods rely on axis-aligned scanning within a fixed window, inevitably discarding spatial information. To address this problem, we propose WinMamba, a novel Mamba-based 3D feature-encoding backbone composed of stacked WinMamba blocks. To enhance the backbone with robust multi-scale representation, the WinMamba block incorporates a window-scale-adaptive module that compensates voxel features across varying resolutions during sampling. Meanwhile, to obtain rich contextual cues within the linear state space, we equip the WinMamba layer with a learnable positional encoding and a window-shift strategy.Extensive experiments on the KITTI and Waymo datasets demonstrate that WinMamba significantly outperforms the baseline. Ablation studies further validate the individual contributions of the WSF and AWF modules in improving detection accuracy. The code will be made publicly available.

AAAI Conference 2025 Conference Paper

AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction

  • Pufan Zou
  • Shijia Zhao
  • Weijie Huang
  • Qiming Xia
  • Chenglu Wen
  • Wei Li
  • Cheng Wang

Recently, Visual Foundation Models (VFMs) have shown a remarkable generalization performance in 3D perception tasks. However, their effectiveness in large-scale outdoor datasets remains constrained by the scarcity of accurate supervision signals, the extensive noise caused by variable outdoor conditions, and the abundance of unknown objects. In this work, we propose a novel label-free learning method, Adaptive Label Correction (AdaCo), for 3D semantic segmentation. AdaCo first introduces the Cross-modal Label Generation Module (CLGM), providing cross-modal supervision with the formidable interpretive capabilities of the VFMs. Subsequently, AdaCo incorporates the Adaptive Noise Corrector (ANC), updating and adjusting the noisy samples within this supervision iteratively during training. Moreover, we develop an Adaptive Robust Loss (ARL) function to modulate each sample's sensitivity to noisy supervision, preventing potential underfitting issues associated with robust loss. Our proposed AdaCo can effectively mitigate the performance limitations of label-free learning networks in 3D semantic segmentation tasks. Extensive experiments on two outdoor benchmark datasets highlight the superior performance of our method.

ICLR Conference 2025 Conference Paper

DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning

  • Chao Li
  • Ziwei Deng
  • Chenxing Lin
  • Wenqi Chen
  • Yongquan Fu
  • Weiquan Liu
  • Chenglu Wen
  • Cheng Wang 0003

Diffusion models have been widely adopted in image and language generation and are now being applied to reinforcement learning. However, the application of diffusion models in offline cooperative Multi-Agent Reinforcement Learning (MARL) remains limited. Although existing studies explore this direction, they suffer from scalability or poor cooperation issues due to the lack of design principles for diffusion-based MARL. The Individual-Global-Max (IGM) principle is a popular design principle for cooperative MARL. By satisfying this principle, MARL algorithms achieve remarkable performance with good scalability. In this work, we extend the IGM principle to the Individual-Global-identically-Distributed (IGD) principle. This principle stipulates that the generated outcome of a multi-agent diffusion model should be identically distributed as the collective outcomes from multiple individual-agent diffusion models. We propose DoF, a diffusion factorization framework for Offline MARL. It uses noise factorization function to factorize a centralized diffusion model into multiple diffusion models. We theoretically show that the noise factorization functions satisfy the IGD principle. Furthermore, DoF uses data factorization function to model the complex relationship among data generated by multiple diffusion models. Through extensive experiments, we demonstrate the effectiveness of DoF. The source code is available at [https://github.com/xmu-rl-3dv/DoF](https://github.com/xmu-rl-3dv/DoF).

AAAI Conference 2025 Conference Paper

L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

  • Xun Huang
  • Ziyu Xu
  • Hai Wu
  • Jinlong Wang
  • Qiming Xia
  • Yan Xia
  • Jonathan Li
  • Kyle Gao

LiDAR-based 3D object detection is crucial for autonomous driving. However, due to the quality deterioration of LiDAR point clouds, it suffers from performance degradation in adverse weather conditions. Fusing LiDAR with the weatherrobust 4D radar sensor is expected to solve this problem; however, it faces challenges of significant differences in terms of data quality and the degree of degradation in adverse weather. To address these issues, we introduce L4DR, a weather-robust 3D object detection method that effectively achieves LiDAR and 4D Radar fusion. Our L4DR proposes Multi-Modal Encoding (MME) and Foreground-Aware Denoising (FAD) modules to reconcile sensor gaps, which is the first exploration of the complementarity of early fusion between LiDAR and 4D radar. Additionally, we design an Inter-Modal and IntraModal ({IM}2) parallel feature extraction backbone coupled with a Multi-Scale Gated Fusion (MSGF) module to counteract the varying degrees of sensor degradation under adverse weather conditions. Experimental evaluation on a VoD dataset with simulated fog proves that L4DR is more adaptable to changing weather conditions. It delivers a significant performance increase under different fog levels, improving the 3D mAP by up to 20.0% over the traditional LiDAR-only approach. Moreover, the results on the K-Radar dataset validate the consistent performance improvement of L4DR in realworld adverse weather conditions.

AAAI Conference 2025 Conference Paper

Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision

  • Maoji Zheng
  • Ziyu Xu
  • Qiming Xia
  • Hai Wu
  • Chenglu Wen
  • Cheng Wang

LIDAR-based 3D object detection and semantic segmentation are critical tasks in 3D scene understanding. Traditional detection and segmentation methods supervise their models through bounding box labels and semantic mask labels. However, these two independent labels inherently contain significant redundancy. This paper aims to eliminate the redundancy by supervising 3D object detection using only semantic labels. However, the challenge arises due to the incomplete geometry structure and boundary ambiguity of point cloud instances, leading to inaccurate pseudo-labels and poor detection results. To address these challenges, we propose a novel method, named Seg2Box. We first introduce a Multi-Frame Multi-Scale Clustering (MFMS-C) module, which leverages the spatio-temporal consistency of point clouds to generate accurate box-level pseudo-labels. Additionally, the Semantic-Guiding Iterative-Mining Self-Training (SGIM-ST) module is proposed to enhance the performance by progressively refining the pseudo-labels and mining the instances without generating pseudo-labels. Experiments on the Waymo Open Dataset and nuScenes Dataset show that our method significantly outperforms other competitive methods by 23.7% and 10.3% in mAP, respectively. The results demonstrate the great label-efficient potential and advancement of our method.

NeurIPS Conference 2024 Conference Paper

Federated Graph Learning for Cross-Domain Recommendation

  • Ziqi Yang
  • Zhaopeng Peng
  • Zihui Wang
  • Jianzhong Qi
  • Chaochao Chen
  • Weike Pan
  • Chenglu Wen
  • Cheng Wang

Cross-domain recommendation (CDR) offers a promising solution to the data sparsity problem by enabling knowledge transfer across source and target domains. However, many recent CDR models overlook crucial issues such as privacy as well as the risk of negative transfer (which negatively impact model performance), especially in multi-domain settings. To address these challenges, we propose FedGCDR, a novel federated graph learning framework that securely and effectively leverages positive knowledge from multiple source domains. First, we design a positive knowledge transfer module that ensures privacy during inter-domain knowledge transmission. This module employs differential privacy-based knowledge extraction combined with a feature mapping mechanism, transforming source domain embeddings from federated graph attention networks into reliable domain knowledge. Second, we design a knowledge activation module to filter out potential harmful or conflicting knowledge from source domains, addressing the issues of negative transfer. This module enhances target domain training by expanding the graph of the target domain to generate reliable domain attentions and fine-tunes the target model for improved negative knowledge filtering and more accurate predictions. We conduct extensive experiments on 16 popular domains of the Amazon dataset, demonstrating that FedGCDR significantly outperforms state-of-the-art methods.

IJCAI Conference 2024 Conference Paper

FedPFT: Federated Proxy Fine-Tuning of Foundation Models

  • Zhaopeng Peng
  • Xiaoliang Fan
  • Yufan Chen
  • Zheng Wang
  • Shirui Pan
  • Chenglu Wen
  • Ruisheng Zhang
  • Cheng Wang

Adapting Foundation Models (FMs) for down- stream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine- tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumula- tions of gradients. In this paper, we propose Feder- ated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise com- pression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations—layer- level and neuron-level—before and during FL fine- tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theo- retical guarantees. Experimental results on seven commonly used datasets (i. e. , four text and three vi- sion) demonstrate the superiority of FedPFT. Our code is available at https: //github. com/pzp-dzd/FedPFT.

NeurIPS Conference 2024 Conference Paper

Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration

  • Kezheng Xiong
  • Haoen Xiang
  • Qingshan Xu
  • Chenglu Wen
  • Siqi Shen
  • Jonathan Li
  • Cheng Wang

Point cloud registration, a fundamental task in 3D vision, has achieved remarkable success with learning-based methods in outdoor environments. Unsupervised outdoor point cloud registration methods have recently emerged to circumvent the need for costly pose annotations. However, they fail to establish reliable optimization objectives for unsupervised training, either relying on overly strong geometric assumptions, or suffering from poor-quality pseudo-labels due to inadequate integration of low-level geometric and high-level contextual information. We have observed that in the feature space, latent new inlier correspondences tend to clusteraround respective positive anchors that summarize features of existing inliers. Motivated by this observation, we propose a novel unsupervised registration method termed INTEGER to incorporate high-level contextual information for reliable pseudo-label mining. Specifically, we propose the Feature-Geometry Coherence Mining module to dynamically adapt the teacher for each mini-batch of data during training and discover reliable pseudo-labels by considering both high-level feature representations and low-level geometric cues. Furthermore, we propose Anchor-Based Contrastive Learning to facilitate contrastive learning with anchors for a robust feature space. Lastly, we introduce a Mixed-Density Student to learn density-invariant features, addressing challenges related to density variation and low overlap in the outdoor scenario. Extensive experiments on KITTI and nuScenes datasets demonstrate that our INTEGER achieves competitive performance in terms of accuracy and generalizability.

AAAI Conference 2024 Conference Paper

SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration

  • Kezheng Xiong
  • Maoji Zheng
  • Qingshan Xu
  • Chenglu Wen
  • Siqi Shen
  • Cheng Wang

Point cloud registration, a fundamental task in 3D computer vision, has remained largely unexplored in cross-source point clouds and unstructured scenes. The primary challenges arise from noise, outliers, and variations in scale and density. However, neglected geometric natures of point clouds restricts the performance of current methods. In this paper, we propose a novel method termed SPEAL to leverage skeletal representations for effective learning of intrinsic topologies of point clouds, facilitating robust capture of geometric intricacy. Specifically, we design the Skeleton Extraction Module to extract skeleton points and skeletal features in an unsupervised manner, which is inherently robust to noise and density variances. Then, we propose the Skeleton-Aware GeoTransformer to encode high-level skeleton-aware features. It explicitly captures the topological natures and inter-point-cloud skeletal correlations with the noise-robust and density-invariant skeletal representations. Next, we introduce the Correspondence Dual-Sampler to facilitate correspondences by augmenting the correspondence set with skeletal correspondences. Furthermore, we construct a challenging novel cross-source point cloud dataset named KITTI CrossSource for benchmarking cross-source point cloud registration methods. Extensive quantitative and qualitative experiments are conducted to demonstrate our approach’s superiority and robustness on both cross-source and same-source datasets. To the best of our knowledge, our approach is the first to facilitate point cloud registration with skeletal geometric priors.

AAAI Conference 2024 Conference Paper

Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection

  • Xun Huang
  • Hai Wu
  • Xin Li
  • Xiaoliang Fan
  • Chenglu Wen
  • Cheng Wang

LiDAR-based 3D object detection models inevitably struggle under rainy conditions due to the degraded and noisy scanning signals. Previous research has attempted to address this by simulating the noise from rain to improve the robustness of detection models. However, significant disparities exist between simulated and actual rain-impacted data points. In this work, we propose a novel rain simulation method, termed DRET, that unifies Dynamics and Rainy Environment Theory to provide a cost-effective means of expanding the available realistic rain data for 3D detection training. Furthermore, we present a Sunny-to-Rainy Knowledge Distillation (SRKD) approach to enhance 3D detection under rainy conditions. Extensive experiments on the Waymo-Open-Dataset show that, when combined with the state-of-the-art DSVT model and other classical 3D detectors, our proposed framework demonstrates significant detection accuracy improvements, without losing efficiency. Remarkably, our framework also improves detection capabilities under sunny conditions, therefore offering a robust solution for 3D detection regardless of whether the weather is rainy or sunny.

AAAI Conference 2023 Conference Paper

Transformation-Equivariant 3D Object Detection for Autonomous Driving

  • Hai Wu
  • Chenglu Wen
  • Wei Li
  • Xin Li
  • Ruigang Yang
  • Cheng Wang

3D object detection received increasing attention in autonomous driving recently. Objects in 3D scenes are distributed with diverse orientations. Ordinary detectors do not explicitly model the variations of rotation and reflection transformations. Consequently, large networks and extensive data augmentation are required for robust detection. Recent equivariant networks explicitly model the transformation variations by applying shared networks on multiple transformed point clouds, showing great potential in object geometry modeling. However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed. In this work, we present TED, an efficient Transformation-Equivariant 3D Detector to overcome the computation cost and speed issues. TED first applies a sparse convolution backbone to extract multi-channel transformation-equivariant voxel features; and then aligns and aggregates these equivariant features into lightweight and compact representations for high-performance 3D object detection. On the highly competitive KITTI 3D car detection leaderboard, TED ranked 1st among all submissions with competitive efficiency. Code is available at https://github.com/hailanyi/TED.

IJCAI Conference 2021 Conference Paper

Federated Learning with Fair Averaging

  • Zheng Wang
  • Xiaoliang Fan
  • Jianzhong Qi
  • Chenglu Wen
  • Cheng Wang
  • Rongshan Yu

Fairness has emerged as a critical problem in federated learning (FL). In this work, we identify a cause of unfairness in FL -- conflicting gradients with large differences in the magnitudes. To address this issue, we propose the federated fair averaging (FedFV) algorithm to mitigate potential conflicts among clients before averaging their gradients. We first use the cosine similarity to detect gradient conflicts, and then iteratively eliminate such conflicts by modifying both the direction and the magnitude of the gradients. We further show the theoretical foundation of FedFV to mitigate the issue conflicting gradients and converge to Pareto stationary solutions. Extensive experiments on a suite of federated datasets confirm that FedFV compares favorably against state-of-the-art methods in terms of fairness, accuracy and efficiency. The source code is available at https: //github. com/WwZzz/easyFL.

IJCAI Conference 2021 Conference Paper

Tracklet Proposal Network for Multi-Object Tracking on Point Clouds

  • Hai Wu
  • Qing Li
  • Chenglu Wen
  • Xin Li
  • Xiaoliang Fan
  • Cheng Wang

This paper proposes the first tracklet proposal network, named PC-TCNN, for Multi-Object Tracking (MOT) on point clouds. Our pipeline first generates tracklet proposals, then refines these tracklets and associates them to generate long trajectories. Specifically, object proposal generation and motion regression are first performed on a point cloud sequence to generate tracklet candidates. Then, spatial-temporal features of each tracklet are exploited and their consistency is used to refine the tracklet proposal. Finally, the refined tracklets across multiple frames are associated to perform MOT on the point cloud sequence. The PC-TCNN significantly improves the MOT performance by introducing the tracklet proposal design. On the KITTI tracking benchmark, it attains an MOTA of 91. 75%, outperforming all submitted results on the online leaderboard.

AAAI Conference 2020 Conference Paper

Point2Node: Correlation Learning of Dynamic-Node for Point Cloud Feature Modeling

  • Wenkai Han
  • Chenglu Wen
  • Cheng Wang
  • Xin Li
  • Qing Li

Fully exploring correlation among points in point clouds is essential for their feature modeling. This paper presents a novel end-to-end graph model, named Point2Node, to represent a given point cloud. Point2Node can dynamically explore correlation among all graph nodes from different levels, and adaptively aggregate the learned features. Specifically, first, to fully explore the spatial correlation among points for enhanced feature description, in a high-dimensional node graph, we dynamically integrate the node’s correlation with self, local, and non-local nodes. Second, to more effectively integrate learned features, we design a data-aware gate mechanism to self-adaptively aggregate features at the channel level. Extensive experiments on various point cloud benchmarks demonstrate that our method outperforms the state-ofthe-art.

IJCAI Conference 2018 Conference Paper

H-Net: Neural Network for Cross-domain Image Patch Matching

  • Weiquan Liu
  • Xuelun Shen
  • Cheng Wang
  • Zhihong Zhang
  • Chenglu Wen
  • Jonathan Li

Describing the same scene with different imaging style or rendering image from its 3D model gives us different domain images. Different domain images tend to have a gap and different local appearances, which raise the main challenge on the cross-domain image patch matching. In this paper, we propose to incorporate AutoEncoder into the Siamese network, named as H-Net, of which the structural shape resembles the letter H. The H-Net achieves state-of-the-art performance on the cross-domain image patch matching. Furthermore, we improved H-Net to H-Net++. The H-Net++ extracts invariant feature descriptors in cross-domain image patches and achieves state-of-the-art performance by feature retrieval in Euclidean space. As there is no benchmark dataset including cross-domain images, we made a cross-domain image dataset which consists of camera images, rendering images from UAV 3D model, and images generated by CycleGAN algorithm. Experiments show that the proposed H-Net and H-Net++ outperform the existing algorithms. Our code and cross-domain image dataset are available at https: //github. com/Xylon-Sean/H-Net.