Author name cluster

Rui Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

AAAI Conference 2026 Conference Paper

RayD3D: Distilling Depth Knowledge Along the Ray for Robust Multi-View 3D Object Detection

Rui Ding
Zhaonian Kuang
Zongwei Zhou
Meng Yang
Xinhu Zheng
Gang Hua

Multi-view 3D detection with bird’s eye view (BEV) is crucial for autonomous driving and robotics, but its robustness in real-world is limited as it struggles to predict accurate depth values. A mainstream solution, cross-modal distillation, transfers depth information from LiDAR to camera models but also unintentionally transfers depth-irrelevant information (e.g. LiDAR density). To mitigate this issue, we propose RayD3D, which transfers crucial depth knowledge along the ray: a line projecting from the camera to true location of an object. It is based on the fundamental imaging principle that predicted location of this object can only vary along this ray, which is finally determined by predicted depth value. Therefore, distilling along the ray enables more effective depth information transfer. More specifically, we design two ray-based distillation modules. Ray-based Contrastive Distillation (RCD) incorporates contrastive learning into distillation by sampling along the ray to learn how LiDAR accurately locates objects. Ray-based Weighted Distillation (RWD) adaptively adjusts distillation weight based on the ray to minimize the interference of depth-irrelevant information in LiDAR. For validation, we widely apply RayD3D into three representative types of BEV-based models, including BEVDet, BEVDepth4D, and BEVFormer. Our method is trained on clean NuScenes, and tested on both clean NuScenes and RoboBEV with a variety types of data corruptions. Our method significantly improves the robustness of all the three base models in all scenarios without increasing inference costs, and achieves the best when compared to recently released multi-view and distillation models.

PDF Details DOI

IROS Conference 2025 Conference Paper

Consistent Feature Alignment for Cross-Modal Knowledge Distillation in Monocular 3D Object Detection

Fan Li
Rui Ding
Meng Yang 0002
Xuguang Lan

Cross-modal knowledge distillation (CMKD) in monocular 3D object detection transfers LiDAR’s accurate depth information to compensate for the limitations of camera model. However, current methods directly align the intermediate features of the teacher and student networks, in which the modality gap between LiDAR and camera hinders their effectiveness. To mitigate this issue, we design two modules, namely, Consistent Alignment Module (CAM) and Deformable Adapter Module (DAM) to reduce the modality gap of CMKD. The CAM transforms intermediate features of LiDAR and camera into some consistent features through a lightweight Target Head. It is based on the observation that some high-level features such as heatmaps and depths are highly correlated in CMKD, though modality gap appears between LiDAR and camera. Therefore, these features can be effectively transferred from teacher to student in CMKD. The DAM introduces a deformable adapter for the intermediate features of the student network to reduce background noise in CMKD. This helps to dynamically align its intermediate features with the teacher network. We then propose a Consistent Feature Alignment network (MonoCFA) for CMKD to boost monocular 3D object detection. Our network integrates the two designed modules at different levels of the teacher and student networks, in order to align the intermediate features of LiDAR and camera more accurately and reliably. Our model can be widely applied to existing monocular 3D object detection models. For validation, we choose the representative MonoDLE, GUPNet, and DID-M3D as base models. Experiments on the KITTI benchmark show that our method significantly outperforms the three base models by 39%, 15. 5%, and 15%, respectively, and achieves state-of-the-art when compared to other CMKD models.

Details

JBHI Journal 2025 Journal Article

MIT-SAM: Medical Image-Text SAM With Mutually Enhanced Heterogeneous Features Fusion for Medical Image Segmentation

Xichuan Zhou
Lingfeng Yan
Rui Ding
Chukwuemeka Clinton Atabansi
Jing Nie
Lihui Chen
Yujie Feng
Haijun Liu

In recent times, leveraging lesion text as supplementary data to enhance the performance of medical image segmentation models has garnered attention. Previous approaches only used attention mechanisms to integrate image and text features, while not effectively utilizing the highly condensed textual semantic information in improving the fused features, resulting in inaccurate lesion segmentation. This paper introduces a novel approach, the Medical Image-Text Segment Anything Model (MIT-SAM), for text-assisted medical image segmentation. Specifically, we introduce the SAM-enhanced image encoder and a Bert-based text encoder to extract heterogeneous features. To better leverage the highly condensed textual semantic information for heterogeneous feature fusion, such as crucial details like position and quantity, we propose the image-text interactive fusion (ITIF) block and self-supervised text reconstruction (SSTR) method. The ITIF block facilitates the mutual enhancement of homogeneous information among heterogeneous features and the SSTR method empowers the model to capture crucial details concerning lesion text, including location, quantity, and other key aspects. Experimental results demonstrate that our proposed model achieves state-of-the-art performance on the QaTa-COV19 and MosMedData+ datasets.

Details DOI

JBHI Journal 2025 Journal Article

MuST-GCN: Multiscale and Hybrid Spatial Temporal Graph Convolutional Network for Accurate Identification of Alcohol Abuse and Alcohol Dependence Within Alcohol Use Disorder

Yule Sun
Rui Ding
Rusdi Bin Abd Rashid
Ming Ma
Muhammad Umer Farooq
Jingxu Chen
Shuangjiang Zhou
Jinhong Ding

Alcohol Use Disorder (AUD), encompassing Alcohol Abuse (AA) and Alcohol Dependence (AD), is a chronic, relapsing brain condition that can cause mental and physical issues. Confusion between AA and AD can lead to ineffective or even excessively aggressive treatments, but distinguishing them is difficult due to similar symptoms and physiological indicators. Fortunately, from a psychological perspective, AA is closely linked to poor behavioral control, while AD is associated with affective lability. These psychological mechanisms are differently featured in the activity of patients' brain regions. Inspired by this, we propose a model called Multiscale and Hybrid Spatial Temporal Graph Convolutional Network (MuST-GCN) that enables to extract the features of electroencephalogram (EEG) signals for accurate identification of AA and AD within AUD. It includes two modules: Multiscale Feature Extraction (MSFE) module with GCN concurrently analyzes inter-regional and intra-regional connectivity of brain regions to extract EEG features, exploring functional connectivity differences in AA and AD patients; Hybrid Spatial Temporal Memory (HSTM) module integrates spatial and temporal attention mechanisms to refine the significant features extracted from MSFE, targeting the key brain regions and temporal dynamics. The HSTM module yields a refined feature representation reducing overfitting and improving multiclass classification accuracy. MuST-GCN is evaluated using five-fold cross-validation on two datasets, achieving classification accuracies of 90. 74% and 99. 99%, respectively, and demonstrating superior performance in identifying AA, AD, and AUD compared to existing methods.

Details DOI

AAAI Conference 2024 Conference Paper

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

Xinyi He
Mengyu Zhou
Xinrun Xu
Xiaojun Ma
Rui Ding
Lun Du
Yan Gao
Ran Jia

Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible operations and require more in-depth analysis. We also develop five innovative and effective annotation methods, harnessing the capabilities of large language models to enhance data quality and quantity. Additionally, we include unclear queries that resemble real-world user questions to test how well models can understand and tackle such challenges. Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

VeXKD: The Versatile Integration of Cross-Modal Fusion and Knowledge Distillation for 3D Perception

Yuzhe Ji
Yijie Chen
Liuqing Yang
Rui Ding
Meng Yang
Xinhu Zheng

Recent advancements in 3D perception have led to a proliferation of network architectures, particularly those involving multi-modal fusion algorithms. While these fusion algorithms improve accuracy, their complexity often impedes real-time performance. This paper introduces VeXKD, an effective and Versatile framework that integrates Cross-Modal Fusion with Knowledge Distillation. VeXKD applies knowledge distillation exclusively to the Bird's Eye View (BEV) feature maps, enabling the transfer of cross-modal insights to single-modal students without additional inference time overhead. It avoids volatile components that can vary across various 3D perception tasks and student modalities, thus improving versatility. The framework adopts a modality-general cross-modal fusion module to bridge the modality gap between the multi-modal teachers and single-modal students. Furthermore, leveraging byproducts generated during fusion, our BEV query guided mask generation network identifies crucial spatial locations across different BEV feature maps in a data-driven manner, significantly enhancing the effectiveness of knowledge distillation. Extensive experiments on the nuScenes dataset demonstrate notable improvements, with up to 6. 9\%/4. 2\% increase in mAP and NDS for 3D detection tasks and up to 4. 3\% rise in mIoU for BEV map segmentation tasks, narrowing the performance gap with multi-modal models.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Gradient-Based Graph Attention for Scene Text Image Super-resolution

Xiangyuan Zhu
Kehua Guo
Hui Fang
Rui Ding
Zheng Wu
Gerald Schaefer

Scene text image super-resolution (STISR) in the wild has been shown to be beneficial to support improved vision-based text recognition from low-resolution imagery. An intuitive way to enhance STISR performance is to explore the well-structured and repetitive layout characteristics of text and exploit these as prior knowledge to guide model convergence. In this paper, we propose a novel gradient-based graph attention method to embed patch-wise text layout contexts into image feature representations for high-resolution text image reconstruction in an implicit and elegant manner. We introduce a non-local group-wise attention module to extract text features which are then enhanced by a cascaded channel attention module and a novel gradient-based graph attention module in order to obtain more effective representations by exploring correlations of regional and local patch-wise text layout properties. Extensive experiments on the benchmark TextZoom dataset convincingly demonstrate that our method supports excellent text recognition and outperforms the current state-of-the-art in STISR. The source code is available at https://github.com/xyzhu1/TSAN.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

ComGAN: Unsupervised Disentanglement and Segmentation via Image Composition

Rui Ding
Kehua Guo
Xiangyuan Zhu
Zheng Wu
Liwei Wang

We propose ComGAN, a simple unsupervised generative model, which simultaneously generates realistic images and high semantic masks under an adversarial loss and a binary regularization. In this paper, we first investigate two kinds of trivial solutions in the compositional generation process, and demonstrate their source is vanishing gradients on the mask. Then, we solve trivial solutions from the perspective of architecture. Furthermore, we redesign two fully unsupervised modules based on ComGAN (DS-ComGAN), where the disentanglement module associates the foreground, background and mask with three independent variables, and the segmentation module learns object segmentation. Experimental results show that (i) ComGAN's network architecture effectively avoids trivial solutions without any supervised information and regularization; (ii) DS-ComGAN achieves remarkable results and outperforms existing semi-supervised and weakly supervised methods by a large margin in both the image disentanglement and unsupervised segmentation tasks. It implies that the redesign of ComGAN is a possible direction for future unsupervised work.

PDF Details

IS Journal 2021 Journal Article

Adversarial Path Sampling for Recommender Systems

Rui Ding
Bowei Chen
Guibing Guo
Xiaochun Yang

Generative adversarial networks (GANs) have achieved a big success in collaborative filtering (CF). However, existing GAN-based methods in CF still suffer from the high-sparsity and cold-start problems; in addition, they also undergo the issues of excessive space complexity or inadequate training. In this article, we propose path2rec a novel adversarial path-based recommendation model to address these limitations of existing GAN-based methods in recommendation task by naturally incorporating auxiliary information (e. g. , social networks and item attributes). It is composed of two modules, 1) pathGAN and 2) path2vec. In pathGAN, we consider both explicit and implicit friends, as well as item attributes by regarding them as the source of graph construction. Then, we propose a smart walk strategy to automatically generate an optimizing path, which can effectively learn the semantic distribution of users and items. In path2vec, to fully exploit context features of the generated path, we use the Continuous Bag of Words (CBOW) model to fine-tune nodes representations learned by pathGAN. Through extensive experiments on real-world datasets, we demonstrate the effectiveness of the proposed path2rec by applying it into top- n item recommendation, which reaches better performance than other counterparts.

Details DOI

AAAI Conference 2020 Conference Paper

Reliable and Efficient Anytime Skeleton Learning

Rui Ding
Yanzhi Liu
Jingjing Tian
Zhouyu Fu
Shi Han
Dongmei Zhang

Skeleton Learning (SL) is the task for learning an undirected graph from the input data that captures their dependency relations. SL plays a pivotal role in causal learning and has attracted growing attention in the research community lately. Due to the high time complexity, anytime SL has emerged which learns a skeleton incrementally and improves it overtime. In this paper, we first propose and advocate the reliability requirement for anytime SL to be practically useful. Reliability requires the intermediately learned skeleton to have precision and persistency. We also present REAL, a novel Reliable and Efficient Anytime Learning algorithm of skeleton. Specifically, we point out that the commonly existing Functional Dependency (FD) among variables could make the learned skeleton violate faithfulness assumption, thus we propose a theory to resolve such incompatibility. Based on this, REAL conducts SL on a reduced set of variables with guaranteed correctness thus drastically improves efficiency. Furthermore, it employs a novel edge-insertion and best-first strategy in anytime fashion for skeleton growing to achieve high reliability and efficiency. We prove that the skeleton learned by REAL converges to the correct skeleton under standard assumptions. Thorough experiments were conducted on both benchmark and realworld datasets demonstrate that REAL significantly outperforms the other state-of-the-art algorithms.

PDF Details