Author name cluster

Hongkai Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

AAAI Conference 2025 Conference Paper

An Efficient and Accurate Dynamic Sparse Training Framework Based on Parameter-Freezing

Lei Li
Haochen Yang
Jiacheng Guo
Hongkai Yu
Minghai Qin
Tianyun Zhang

Federated learning is a decentralized machine learning approach that consists of servers and clients. It protects data privacy during model training by keeping the training data locally in each client. However, the requirement for the server and clients to frequently synchronize the parameters of the model brings a heavy burden to the communication links, especially when the model size has grown drastically in recent years. Several methods have been proposed to compress the model size by sparsification to reduce the communication overhead, albeit with significant accuracy degradation. In this work, we propose methods to better trade-off between model accuracy and training efficiency in federated learning. Our first proposed method is a novel sparse mask readjustment rule on the server and the second is a parameter-freezing method during training on the clients. Experimental results show that the model accuracy has significantly improved when combining our proposed methods. For example, compared with the previous state-of-the-art methods with the same total amount of communication cost and computation FLOPs, the accuracy increases on average by 4% and 6% in our methods for CIFAR-10 and CIFAR-100 datasets on ResNet-18, respectively. On the other hand, when targeting the same accuracy, the proposed method can reduce the communication cost by 4-8 times for different datasets with different sparsity levels.

PDF Details DOI

IROS Conference 2025 Conference Paper

CoMamba: Real-time Cooperative Perception Unlocked with State-Space Models

Jinlong Li
Xinyu Liu 0009
Baolu Li
Runsheng Xu
Jiachen Li 0001
Hongkai Yu
Zhengzhong Tu

Cooperative perception systems play a vital role in enhancing the safety and efficiency of vehicular autonomy. Although recent studies have highlighted the efficacy of vehicle-to-everything (V2X) communication techniques in autonomous driving, a significant challenge persists: how to efficiently integrate multiple high-bandwidth features across an expanding network of connected agents such as vehicles and infrastructure. In this paper, we introduce CoMamba, a novel cooperative 3D detection framework designed to leverage state-space models for real-time onboard vehicle perception. Compared to prior state-of-the-art transformer-based models, CoMamba enjoys being a more scalable 3D model using bidirectional state space models, bypassing the quadratic complexity pain-point of attention mechanisms. Through extensive experimentation on V2X/V2V datasets, CoMamba achieves superior performance compared to existing methods while maintaining real-time processing capabilities. The proposed framework not only enhances object detection accuracy but also significantly reduces processing time, making it a promising solution for next-generation cooperative perception systems in intelligent transportation networks.

Details

ICRA Conference 2025 Conference Paper

V2X-DG: Domain Generalization for Vehicle-to-Everything Cooperative Perception

Baolu Li
Zongzhe Xu
Jinlong Li
Xinyu Liu 0009
Jianwu Fang
Xiaopeng Li 0020
Hongkai Yu

LiDAR-based Vehicle-to-Everything (V2X) cooperative perception has demonstrated its impact on the safety and effectiveness of autonomous driving. Since current cooperative perception algorithms are trained and tested on the same dataset, the generalization ability of cooperative perception systems remains underexplored. This paper is the first work to study the Domain Generalization problem of LiDAR-based V2X cooperative perception (V2X-DG) for 3D detection based on four widely-used open source datasets: OPV2V, V2XSet, V2V4Real and DAIR-V2X. Our research seeks to sustain high performance not only within the source domain but also across other unseen domains, achieved solely through training on source domain. To this end, we propose Cooperative Mixup Augmentation based Generalization (CMAG) to improve the model generalization capability by simulating the unseen cooperation, which is designed compactly for the domain gaps in cooperative perception. Furthermore, we propose a constraint for the regularization of the robust generalized feature representation learning: Cooperation Feature Consistency (CFC), which aligns the intermediately fused features of the generalized cooperation by CMAG and the early fused features of the original cooperation in source domain. Extensive experiments demonstrate that our approach achieves significant performance gains when generalizing to other unseen datasets while it also maintains strong performance on the source dataset.

Details

ICRA Conference 2025 Conference Paper

V2X-DGW: Domain Generalization for Multi-Agent Perception Under Adverse Weather Conditions

Baolu Li
Jinlong Li
Xinyu Liu 0009
Runsheng Xu
Zhengzhong Tu
Jiacheng Guo
Qin Zou 0001
Xiaopeng Li 0020

Current LiDAR-based Vehicle-to-Everything (V2X) multi-agent perception systems have shown the significant success on 3D object detection. While these models perform well in the trained clean weather, they struggle in unseen adverse weather conditions with the domain gap. In this paper, we propose a Domain Generalization based approach, named V2X-DGW, for LiDAR-based 3D object detection on multi-agent perception system under adverse weather conditions. Our research aims to not only maintain favorable multi-agent performance in the clean weather but also promote the performance in the unseen adverse weather conditions by learning only on the clean weather data. To realize the Domain Generalization, we first introduce the Adaptive Weather Augmentation (AWA) to mimic the unseen adverse weather conditions, and then propose two alignments for generalizable representation learning: Trust-region Weatherinvariant Alignment (TWA) and Agent-aware Contrastive Alignment (ACA). To evaluate this research, we add Fog, Rain, Snow conditions on two publicized multi-agent datasets based on physics-based models, resulting in two new datasets: OPV2V-w and V2XSet-w. Extensive experiments demonstrate that our V2X-DGW achieved significant improvements in the unseen adverse weathers. The code is available at https://github.com/Baolu1998/V2X-DGW.

Details

ICRA Conference 2024 Conference Paper

AdvGPS: Adversarial GPS for Multi-Agent Perception Attack

Jinlong Li
Baolu Li
Xinyu Liu 0009
Jianwu Fang
Felix Juefei-Xu
Qing Guo 0005
Hongkai Yu

The multi-agent perception system collects visual data from sensors located on various agents and leverages their relative poses determined by GPS signals to effectively fuse information, mitigating the limitations of single-agent sensing, such as occlusion. However, the precision of GPS signals can be influenced by a range of factors, including wireless transmission and obstructions like buildings. Given the pivotal role of GPS signals in perception fusion and the potential for various interference, it becomes imperative to investigate whether specific GPS signals can easily mislead the multi-agent perception system. To address this concern, we frame the task as an adversarial attack challenge and introduce ADVGPS, a method capable of generating adversarial GPS signals which are also stealthy for individual agents within the system, significantly reducing object detection accuracy. To enhance the success rates of these attacks in a black-box scenario, we introduce three types of statistically sensitive natural discrepancies: appearance-based discrepancy, distribution-based discrepancy, and task-aware discrepancy. Our extensive experiments on the OPV2V dataset demonstrate that these attacks substantially undermine the performance of state-of-the-art methods, showcasing remarkable transferability across different point cloud based 3D detection systems. This alarming revelation underscores the pressing need to address security implications within multi-agent perception systems, thereby underscoring a critical area of research. The code is available at https://github.com/jinlong17/AdvGPS.

Details

ICRA Conference 2024 Conference Paper

Breaking Data Silos: Cross-Domain Learning for Multi-Agent Perception from Independent Private Sources

Jinlong Li
Baolu Li
Xinyu Liu 0009
Runsheng Xu
Jiaqi Ma 0003
Hongkai Yu

The diverse agents in multi-agent perception systems may be from different companies. Each company might use the identical classic neural network architecture based encoder for feature extraction. However, the data source to train the various agents is independent and private in each company, leading to the Distribution Gap of different private data for training distinct agents in multi-agent perception system. The data silos by the above Distribution Gap could result in a significant performance decline in multi-agent perception. In this paper, we thoroughly examine the impact of the distribution gap on existing multi-agent perception systems. To break the data silos, we introduce the Feature Distribution-aware Aggregation (FDA) framework for cross-domain learning to mitigate the above Distribution Gap in multi-agent perception. FDA comprises two key components: Learnable Feature Compensation Module and Distribution-aware Statistical Consistency Module, both aimed at enhancing intermediate features to minimize the distribution gap among multi-agent features. Intensive experiments on the public OPV2V and V2XSet datasets underscore FDA’s effectiveness in point cloud-based 3D object detection, presenting it as an invaluable augmentation to existing multi-agent perception systems. The code is available at https://github.com/jinlong17/BDS-V2V.

Details

TIST Journal 2024 Journal Article

Learning Cross-modality Interaction for Robust Depth Perception of Autonomous Driving

Yunji Liang
Nengzhen Chen
Zhiwen Yu
Lei Tang
Hongkai Yu
Bin Guo
Daniel Dajun Zeng

As one of the fundamental tasks of autonomous driving, depth perception aims to perceive physical objects in three dimensions and to judge their distances away from the ego vehicle. Although great efforts have been made for depth perception, LiDAR-based and camera-based solutions have limitations with low accuracy and poor robustness for noise input. With the integration of monocular cameras and LiDAR sensors in autonomous vehicles, in this article, we introduce a two-stream architecture to learn the modality interaction representation under the guidance of an image reconstruction task to compensate for the deficiencies of each modality in a parallel manner. Specifically, in the two-stream architecture, the multi-scale cross-modality interactions are preserved via a cascading interaction network under the guidance of the reconstruction task. Next, the shared representation of modality interaction is integrated to infer the dense depth map due to the complementarity and heterogeneity of the two modalities. We evaluated the proposed solution on the KITTI dataset and CALAR synthetic dataset. Our experimental results show that learning the coupled interaction of modalities under the guidance of an auxiliary task can lead to significant performance improvements. Furthermore, our approach is competitive against the state-of-the-art models and robust against the noisy input. The source code is available at https://github.com/tonyFengye/Code/tree/master.

Details DOI

ICRA Conference 2024 Conference Paper

S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Jinlong Li
Runsheng Xu
Xinyu Liu 0009
Baolu Li
Qin Zou 0001
Jiaqi Ma 0003
Hongkai Yu

Due to the lack of enough real multi-agent data and time-consuming of labeling, existing multi-agent cooperative perception algorithms usually select the simulated sensor data for training and validating. However, the perception performance is degraded when these simulation-trained models are deployed to the real world, due to the significant domain gap between the simulated and real data. In this paper, we propose the first Simulation-to-Reality transfer learning framework for multi-agent cooperative perception using a novel Vision Transformer, named as S2R-ViT, which considers both the Deployment Gap and Feature Gap between simulated and real data. We investigate the effects of these two types of domain gaps and propose a novel uncertainty-aware vision transformer to effectively relief the Deployment Gap and an agent-based feature adaptation module with inter-agent and ego-agent discriminators to reduce the Feature Gap. Our intensive experiments on the public multi-agent cooperative perception datasets OPV2V and V2V4Real demonstrate that the proposed S2R-ViT can effectively bridge the gap from simulation to reality and outperform other methods significantly for point cloud-based 3D object detection.

Details

AAAI Conference 2024 Conference Paper

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

Youhong Wang
Yunji Liang
Hao Xu
Shaohui Jiao
Hongkai Yu

Recently, self-supervised monocular depth estimation has gained popularity with numerous applications in autonomous driving and robotics. However, existing solutions primarily seek to estimate depth from immediate visual features, and struggle to recover fine-grained scene details. In this paper, we introduce SQLdepth, a novel approach that can effectively learn fine-grained scene structure priors from ego-motion. In SQLdepth, we propose a novel Self Query Layer (SQL) to build a self-cost volume and infer depth from it, rather than inferring depth from feature maps. We show that, the self-cost volume is an effective inductive bias for geometry learning, which implicitly models the single-frame scene geometry, with each slice of it indicating a relative distance map between points and objects in a latent space. Experimental results on KITTI and Cityscapes show that our method attains remarkable state-of-the-art performance, and showcases computational efficiency, reduced training complexity, and the ability to recover fine-grained scene details. Moreover, the self-matching-oriented relative distance querying in SQL improves the robustness and zero-shot generalization capability of SQLdepth. Code is available at https://github.com/hisfog/SfMNeXt-Impl.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Vehicle Behavior Prediction by Episodic-Memory Implanted NDT

Peining Shen
Jianwu Fang
Hongkai Yu
Jianru Xue

In autonomous driving, predicting the behavior (turning left, stopping, etc.) of target vehicles is crucial for the self-driving vehicle to make safe decisions and avoid accidents. Existing deep learning-based methods have shown excellent and accurate performance, but the black-box nature makes it untrustworthy to apply them in practical use. In this work, we explore the interpretability of behavior prediction of target vehicles by an Episodic Memory implanted Neural Decision Tree (abbrev. eMem-NDT). The structure of eMem-NDT is constructed by hierarchically clustering the text embedding of vehicle behavior descriptions. eMem-NDT is a neural-backed part of a pre-trained deep learning model by changing the soft-max layer of the deep model to eMem-NDT, for grouping and aligning the memory prototypes of the historical vehicle behavior features in training data on a neural decision tree. Each leaf node of eMem-NDT is modeled by a neural network for aligning the behavior memory prototypes. By eMem-NDT, we infer each instance in behavior prediction of vehicles by bottom-up Memory Prototype Matching (MPM) (searching the appropriate leaf node and the links to the root node) and top-down Leaf Link Aggregation (LLA) (obtaining the probability of future behaviors of vehicles for certain instances). We validate eMem-NDT on BLVD and LOKI datasets, and the results show that our model can obtain a superior performance to other methods with clear explainability. The code is available in https://github.com/JWFangit/eMem-NDT.

Details

ICRA Conference 2023 Conference Paper

Bridging the Domain Gap for Multi-Agent Perception

Runsheng Xu
Jinlong Li
Xiaoyu Dong
Hongkai Yu
Jiaqi Ma 0003

Existing multi-agent perception algorithms usually select to share deep neural features extracted from raw sensing data between agents, achieving a trade-off between accuracy and communication bandwidth limit. However, these methods assume all agents have identical neural networks, which might not be practical in the real world. The transmitted features can have a large domain gap when the models differ, leading to a dramatic performance drop in multi-agent perception. In this paper, we propose the first lightweight framework to bridge such domain gaps for multi-agent perception, which can be a plug-in module for most of the existing systems while maintaining confidentiality. Our framework consists of a learnable feature resizer to align features in multiple dimensions and a sparse cross-domain transformer for domain adaption. Extensive experiments on the public multi-agent perception dataset V2XSet have demonstrated that our method can effectively bridge the gap for features from different domains and outperform other baseline methods significantly by at least 8% for point-cloud-based 3D object detection.

Details

AAAI Conference 2020 Conference Paper

Multi-Spectral Salient Object Detection by Adversarial Domain Adaptation

Shaoyue Song
Hongkai Yu
Zhenjiang Miao
Jianwu Fang
Kang Zheng
Cong Ma
Song Wang

Although there are many existing research works about the salient object detection (SOD) in RGB images, there are still many complex situations that regular RGB images cannot provide enough cues for the accurate SOD, such as the shadow effect, similar appearance between background and foreground, strong or insufﬁcient illumination, etc. Because of the success of near-infrared spectrum in many computer vision tasks, we explore the multi-spectral SOD in the synchronized RGB images and near-infrared (NIR) images for the both simple and complex situations. We assume that the RGB SOD in the existing RGB image datasets could provide references for the multi-spectral SOD problem. In this paper, we ﬁrst collect and will publicize a large multi-spectral dataset including 780 synchronized RGB and NIR image pairs for the multi-spectral SOD problem in the simple and complex situations. We model this research problem as an adversarial domain adaptation from the existing RGB image dataset (source domain) to the collected multi-spectral dataset (target domain). Experimental results show the effectiveness and accuracy of the proposed adversarial domain adaptation for the multi-spectral SOD.

PDF Details

AAAI Conference 2018 Conference Paper

Co-Saliency Detection Within a Single Image

Hongkai Yu
Kang Zheng
Jianwu Fang
Hao Guo
Wei Feng
Song Wang

Recently, saliency detection in a single image and co-saliency detection in multiple images have drawn extensive research interest in the vision community. In this paper, we investigate a new problem of co-saliency detection within a single image, i. e. , detecting within-image co-saliency. By identifying common saliency within an image, e. g. , highlighting multiple occurrences of an object class with similar appearance, this work can beneﬁt many important applications, such as the detection of objects of interest, more robust object recognition, reduction of information redundancy, and animation synthesis. We propose a new bottom-up method to address this problem. Speciﬁcally, a large number of object proposals are ﬁrst detected from the image. Then we develop an optimization algorithm to derive a set of proposal groups, each of which contains multiple proposals showing good common saliency in the original image. For each proposal group, we calculate a co-saliency map and then use a low-rank based algorithm to fuse the maps calculated from all the proposal groups for the ﬁnal co-saliency map in the image. In the experiment, we collect a new dataset of 364 color images with within-image cosaliency. Experiment results show that the proposed method can better detect the within-image co-saliency than existing algorithms.

PDF Details