Arrow Research search

Author name cluster

Rui Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

NeurIPS Conference 2025 Conference Paper

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

  • Dapeng Zhang
  • Fei Shen
  • Rui Zhao
  • Yinda Chen
  • Peng Zhi
  • Chenyang Li
  • Rui Zhou
  • Qingguo Zhou

Autonomous driving represents a prominent application of artificial intelligence. Recent approaches have shifted from focusing solely on common scenarios to addressing complex, long-tail situations such as subtle human behaviors, traffic accidents, and non-compliant driving patterns. Given the demonstrated capabilities of large language models (LLMs) in understanding visual and natural language inputs and following instructions, recent methods have integrated LLMs into autonomous driving systems to enhance reasoning, interpretability, and performance across diverse scenarios. However, existing methods typically rely either on real-world data, which is suitable for industrial deployment, or on simulation data tailored to rare or hard case scenarios. Few approaches effectively integrate the complementary advantages of both data sources. To address this limitation, we propose a novel VLM-guided, end-to-end adversarial transfer framework for autonomous driving that transfers long-tail handling capabilities from simulation to real-world deployment, named CoC-VLA. The framework comprises a teacher VLM model, a student VLM model, and a discriminator. Both the teacher and student VLM models utilize a shared base architecture, termed the Chain-of-Causality Visual–Language Model (CoC VLM), which integrates temporal information via an end-to-end text adapter. This architecture supports chain-of-thought reasoning to infer complex driving logic. The teacher and student VLM models are pre-trained separately on simulated and real-world datasets. The discriminator is trained adversarially to facilitate the transfer of long-tail handling capabilities from simulated to real-world environments by the student VLM model, using a novel backpropagation strategy. Experimental results show that our method effectively bridges the gap between simulation and real-world autonomous driving, indicating a promising direction for future research.

ICML Conference 2025 Conference Paper

COSDA: Counterfactual-based Susceptibility Risk Framework for Open-Set Domain Adaptation

  • Wenxu Wang
  • Rui Zhou
  • Jing Wang
  • Yun Zhou
  • Cheng Zhu
  • Ruichun Tang
  • Bo Han
  • Nevin L. Zhang

Open-Set Domain Adaptation (OSDA) aims to transfer knowledge from the labeled source domain to the unlabeled target domain that contains unknown categories, thus facing the challenges of domain shift and unknown category recognition. While recent works have demonstrated the potential of causality for domain alignment, little exploration has been conducted on causal-inspired theoretical frameworks for OSDA. To fill this gap, we introduce the concept of Susceptibility and propose a novel C ounterfactual-based susceptibility risk framework for OSDA, termed COSDA. Specifically, COSDA consists of three novel components: (i) a Susceptibility Risk Estimator (SRE) for capturing causal information, along with comprehensive derivations of the computable theoretical upper bound, forming a risk minimization framework under the OSDA paradigm; (ii) a Contrastive Feature Alignment (CFA) module, which is theoretically proven based on mutual information to satisfy the Exogeneity assumption and facilitate cross-domain feature alignment; (iii) a Virtual Multi-unknown-categories Prototype (VMP) pseudo-labeling strategy, providing label information by measuring how similar samples are to known and multiple virtual unknown category prototypes, thereby assisting in open-set recognition and intra-class discriminative feature learning. Extensive experiments demonstrate that our approach achieves state-of-the-art performance.

AAAI Conference 2025 Conference Paper

MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert

  • Dapeng Zhang
  • Dayu Chen
  • Peng Zhi
  • Yinda Chen
  • Zhenlong Yuan
  • Chenyang Li
  • Sunjing
  • Rui Zhou

Constructing online High-Definition (HD) maps is crucial for the static environment perception of autonomous driving systems (ADS). Existing solutions typically attempt to detect vectorized HD map elements with unified models; however, these methods often overlook the distinct characteristics of different non-cubic map elements, making accurate distinction challenging. To address these issues, we introduce an expert-based online HD map method, termed MapExpert. MapExpert utilizes sparse experts, distributed by our routers, to describe various non-cubic map elements accurately. Additionally, we propose an auxiliary balance loss function to distribute the load evenly across experts. Furthermore, we theoretically analyze the limitations of prevalent bird's-eye view (BEV) feature temporal fusion methods and introduce an efficient temporal fusion module called Learnable Weighted Moving Descentage. This module effectively integrates relevant historical information into the final BEV features. Combined with an enhanced slice head branch, the proposed MapExpert achieves state-of-the-art performance and maintains good efficiency on both nuScenes and Argoverse2 datasets.

ICRA Conference 2025 Conference Paper

OoDIS: Anomaly Instance Segmentation and Detection Benchmark

  • Alexey Nekrasov 0001
  • Rui Zhou
  • Miriam Ackermann
  • Alexander Hermans
  • Bastian Leibe
  • Matthias Rottmann

Safe navigation of self-driving cars and robots requires a precise understanding of their environment. Training data for perception systems cannot cover the wide variety of objects that may appear during deployment. Thus, reliable identification of unknown objects, such as wild animals and untypical obstacles, is critical due to their potential to cause serious accidents. Significant progress in semantic segmentation of anomalies has been facilitated by the availability of out-of-distribution (OOD) benchmarks. However, a comprehensive understanding of scene dynamics requires the segmentation of individual objects, and thus the segmentation of instances is essential. Development in this area has been lagging, largely due to the lack of dedicated benchmarks. The situation is similar in object detection. While there is interest in detecting and potentially tracking every anomalous object, the availability of dedicated benchmarks is clearly limited. To address this gap, this work extends some commonly used anomaly segmentation benchmarks to include the instance segmentation and object detection tasks. Our evaluation of anomaly instance segmentation and object detection methods shows that both of these challenges remain unsolved problems. We provide a competition and benchmark website under https://vision.rwth-aachen.de/oodis.

ICRA Conference 2025 Conference Paper

Self-Sufficient 5-DoF Discrete Global Localization for Magnetically-Actuated Endoscope in Bronchoscopy

  • Jiewen Tan
  • Da Zhao
  • Rui Zhou
  • Wenxuan Xie
  • Shing Shin Cheng

Existing sensor-based global localization methods limit the miniaturization potential of magnetically-actuated endoscopes (MAE) while localization based on external medical imaging demands accurate registration and imposes a variety of modality-specific challenges during continuous image acquisition. This work proposes a novel self-sufficient method for discrete (one-time) global localization of an MAE based solely on inherent endoscopic images without any prior MAE pose information. More specifically, it adopts a model-free control approach to determine five different external magnet (EM) poses (corresponding to five independent nonlinear equations) that can align the MAE image center with the lumen center while the MAE maintains the same pose. The five degree-of-freedom (DoF) global pose of the MAE can then be estimated by minimizing the root mean square of MAE's torque balance residuals under these EM poses. Our proposed method achieves similar accuracy as other sensor-based methods for permanent magnet-driven MAE with $\mathbf{6. 7} \pm \mathbf{2. 1}$ mm position error and $\mathbf{9. 5} \pm \mathbf{2. 9}^{\circ}$ orientation error in the experiments. Compared to existing methods, our approach does not require physical sensor integration, enabling a more compact endoscope design for exploration in narrower respiratory tracts. It also offers a critical step toward achieving sensorless and continuous global localization of the permanent magnet-driven MAE during its autonomous navigation.

NeurIPS Conference 2024 Conference Paper

Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model

  • Wenjia Xie
  • Hao Wang
  • Luankang Zhang
  • Rui Zhou
  • Defu Lian
  • Enhong Chen

Sequential recommendation (SR) aims to predict items that users may be interested in based on their historical behavior sequences. We revisit SR from a novel information-theoretic perspective and find that conventional sequential modeling methods fail to adequately capture the randomness and unpredictability of user behavior. Inspired by fuzzy information processing theory, this paper introduces the DDSR model, which uses fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users' real interests. Formally based on diffusion transition processes in discrete state spaces, which is unlike common diffusion models such as DDPM that operate in continuous domains. It is better suited for discrete data, using structured transitions instead of arbitrary noise introduction to avoid information loss. Additionally, to address the inefficiency of matrix transformations due to the vast discrete space, we use semantic labels derived from quantization or RQ-VAE to replace item IDs, enhancing efficiency and improving cold start issues. Testing on three public benchmark datasets shows that DDSR outperforms existing state-of-the-art methods in various settings, demonstrating its potential and effectiveness in handling SR tasks.

ICLR Conference 2024 Conference Paper

Internal Cross-layer Gradients for Extending Homogeneity to Heterogeneity in Federated Learning

  • Yun-Hin Chan
  • Rui Zhou
  • Running Zhao
  • Zhihan Jiang
  • Edith C. H. Ngai

Federated learning (FL) inevitably confronts the challenge of system heterogeneity in practical scenarios. To enhance the capabilities of most model-homogeneous FL methods in handling system heterogeneity, we propose a training scheme that can extend their capabilities to cope with this challenge. In this paper, we commence our study with a detailed exploration of homogeneous and heterogeneous FL settings and discover three key observations: (1) a positive correlation between client performance and layer similarities, (2) higher similarities in the shallow layers in contrast to the deep layers, and (3) the smoother gradients distributions indicate the higher layer similarities. Building upon these observations, we propose InCo Aggregation that leverages internal cross-layer gradients, a mixture of gradients from shallow and deep layers within a server model, to augment the similarity in the deep layers without requiring additional communication between clients. Furthermore, our methods can be tailored to accommodate model-homogeneous FL methods such as FedAvg, FedProx, FedNova, Scaffold, and MOON, to expand their capabilities to handle the system heterogeneity. Copious experimental results validate the effectiveness of InCo Aggregation, spotlighting internal cross-layer gradients as a promising avenue to enhance the performance in heterogeneous FL.

ICRA Conference 2024 Conference Paper

MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts

  • Zhuo Xu
  • Rui Zhou
  • Yida Yin
  • Huidong Gao
  • Masayoshi Tomizuka
  • Jiachen Li 0001

Data-driven methods have great advantages in modeling complicated human behavioral dynamics and dealing with many human-robot interaction applications. However, collecting massive and annotated real-world human datasets has been a laborious task, especially for highly interactive scenarios. On the other hand, algorithmic data generation methods are usually limited by their model capacities, making them unable to offer realistic and diverse data needed by various application users. In this work, we study trajectory-level data generation for multi-human or human-robot interaction scenarios and propose a learning-based automatic trajectory generation model, which we call Multi-Agent TRajectory generation with dIverse conteXts (MATRIX). MATRIX is capable of generating interactive human behaviors in realistic diverse contexts. We achieve this goal by modeling the explicit and interpretable objectives so that MATRIX can generate human motions based on diverse destinations and heterogeneous behaviors. We carried out extensive comparison and ablation studies to illustrate the effectiveness of our approach across various metrics. We also presented experiments that demonstrate the capability of MATRIX to serve as data augmentation for imitation-based motion planning.

ICRA Conference 2022 Conference Paper

Grouptron: Dynamic Multi-Scale Graph Convolutional Networks for Group-Aware Dense Crowd Trajectory Forecasting

  • Rui Zhou
  • Hongyu Zhou
  • Huidong Gao
  • Masayoshi Tomizuka
  • Jiachen Li 0001
  • Zhuo Xu

Accurate, long-term forecasting of pedestrian trajectories in highly dynamic and interactive scenes is a longstanding challenge. Recent advances in using data-driven approaches have achieved significant improvements in terms of prediction accuracy. However, the lack of group-aware analysis has limited the performance of forecasting models. This is especially nonnegligible in highly crowded scenes, where pedestrians are moving in groups and the interactions between groups are extremely complex and dynamic. In this paper, we present Grouptron, a multi-scale dynamic forecasting framework that leverages pedestrian group detection and utilizes individual-level, group-level and scene-level information for better understanding and representation of the scenes. Our approach employs spatio-temporal clustering algorithms to identify pedestrian groups, creates spatio-temporal graphs at the individual, group, and scene levels. It then uses graph neural networks to encode dynamics at different scales and aggregate the embeddings for trajectory prediction. We conducted extensive comparisons and ablation experiments to demonstrate the effectiveness of our approach. Our method achieves 9. 3% decrease in final displacement error (FDE) compared with state-of-the-art methods on ETH/UCY benchmark datasets, and 16. 1% decrease in FDE in more crowded scenes where extensive human group interactions are more frequently present.

IJCAI Conference 2022 Conference Paper

MetaER-TTE: An Adaptive Meta-learning Model for En Route Travel Time Estimation

  • Yu Fan
  • Jiajie Xu
  • Rui Zhou
  • Jianxin Li
  • Kai Zheng
  • Lu Chen
  • Chengfei Liu

En route travel time estimation (ER-TTE) aims to predict the travel time on the remaining route. Since the traveled and remaining parts of a trip usually have some common characteristics like driving speed, it is desirable to explore these characteristics for improved performance via effective adaptation. This yet faces the severe problem of data sparsity due to the few sampled points in a traveled partial trajectory. Since trajectories with different contextual information tend to have different characteristics, the existing meta-learning method for ER-TTE cannot fit each trajectory well because it uses the same model for all trajectories. To this end, we propose a novel adaptive meta-learning model called MetaER-TTE. Particularly, we utilize soft-clustering and derive cluster-aware initialized parameters to better transfer the shared knowledge across trajectories with similar contextual information. In addition, we adopt a distribution-aware approach for adaptive learning rate optimization, so as to avoid task-overfitting which will occur when guiding the initial parameters with a fixed learning rate for tasks under imbalanced distribution. Finally, we conduct comprehensive experiments to demonstrate the superiority of MetaER-TTE.

TIST Journal 2021 Journal Article

TAML: A Traffic-aware Multi-task Learning Model for Estimating Travel Time

  • Jiajie Xu
  • Saijun Xu
  • Rui Zhou
  • Chengfei Liu
  • An Liu
  • Lei Zhao

Travel time estimation has been recognized as an important research topic that can find broad applications. Existing approaches aim to explore mobility patterns via trajectory embedding for travel time estimation. Though state-of-the-art methods utilize estimated traffic condition (by explicit features such as average traffic speed) for auxiliary supervision of travel time estimation, they fail to model their mutual influence and result in inaccuracy accordingly. To this end, in this article, we propose an improved traffic-aware model, called TAML, which adopts a multi-task learning network to integrate a travel time estimator and a traffic estimator in a shared space and improves the accuracy of estimation by enhanced representation of traffic condition, such that more meaningful implicit features are fully captured. In TAML, multi-task learning is further applied for travel time estimation in multi-granularities (including road segment, sub-path, and entire path). The multiple loss functions are combined by considering the homoscedastic uncertainty of each task. Extensive experiments on two real trajectory datasets demonstrate the effectiveness of our proposed methods.

ICRA Conference 2020 Conference Paper

Bioinspired object motion filters as the basis of obstacle negotiation in micro aerial systems

  • Rui Zhou
  • Huai-Ti Lin

All animals and robots that move in the world must navigate to a goal while clearing obstacles. Using vision to accomplish such task has several advantages in cost and payload, which explains the prevalence of biological visual guidance. However, the computational overhead has been an obvious concern when increasing number of pixels and frames that need to be analyzed in real-time for a machine vision system. The use of motion vision and optic flow has been a popular bio-inspired solution for this problem. However, many early-stage motion detection approaches rely on special hardware (e. g. event-cameras) or extensive computation (e. g. dense optic flow map). Here we demonstrate a method to combine an insect vision inspired object motion filter model with simple visual guidance rules to fly through a cluttered environment. We have implemented a complete feedback control loop in a micro racing drone and achieved proximaldistal object separation through only two object motion filters. We discuss the key constraints and the scalability of this approach for future development.

IJCAI Conference 2020 Conference Paper

Pivot-based Maximal Biclique Enumeration

  • Aman Abidi
  • Rui Zhou
  • Lu Chen
  • Chengfei Liu

Enumerating maximal bicliques in a bipartite graph is an important problem in data mining, with innumerable real-world applications across different domains such as web community, bioinformatics, etc. Although substantial research has been conducted on this problem, surprisingly, we find that pivot-based search space pruning, which is quite effective in clique enumeration, has not been exploited in biclique scenario. Therefore, in this paper, we explore the pivot-based pruning for biclique enumeration. We propose an algorithm for implementing the pivot-based pruning, powered by an effective index structure Containment Directed Acyclic Graph (CDAG). Meanwhile, existing literature indicates contradictory findings on the order of vertex selection in biclique enumeration. As such, we re-examine the problem and suggest an offline ordering of vertices which expedites the pivot pruning. We conduct an extensive performance study using real-world datasets from a wide range of domains. The experimental results demonstrate that our algorithm is more scalable and outperforms all the existing algorithms across all datasets and can achieve a significant speedup against the previous algorithms.