Arrow Research search

Author name cluster

Rong Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
1 author row

Possible papers

10

JBHI Journal 2026 Journal Article

Subject-Adaptive EEG Decoding via Filter-Bank Neural Architecture Search for BCI Applications

  • Chong Wang
  • Li Yang
  • Bingfan Yuan
  • Jiafan Zhang
  • Chen Jin
  • Rong Li
  • Junjie Bu

Individual differences pose a significant challenge in brain-computer interface (BCI) research. Designing a universally applicable network architecture is impractical due to the variability in human brain structure and function. We propose Filter-Bank Neural Architecture Search (FBNAS), an EEG decoding framework that automates network architecture design for individuals. FBNAS uses three temporal cells to process different frequency EEG signals, with dilated convolution kernels in their search spaces. A multi-path NAS algorithm determines optimal architectures for multi-scale feature extraction. We benchmarked FBNAS on three EEG datasets across two BCI paradigms, comparing it to six state-of-the-art deep learning algorithms. FBNAS achieved cross-session decoding accuracies of 79. 78%, 70. 66%, and 68. 38% on the BCIC-IV-2a, OpenBMI, and SEED datasets, respectively, outperforming other methods. Our results show that FBNAS customizes decoding models to address individual differences, enhancing decoding performance and shifting model design from expert-driven to machine-aided. The source code can be found at https://github.com/wang1239435478/FBNAS-master.

NeurIPS Conference 2025 Conference Paper

3EED: Ground Everything Everywhere in 3D

  • Rong Li
  • Yuhao Dong
  • Tianshuai Hu
  • Alan Liang
  • Youquan Liu
  • Dongyue Lu
  • Liang Pan
  • Lingdong Kong

Visual grounding in 3D is the key for embodied agents to localize language-referred objects in open-world environments. However, existing benchmarks are limited to indoor focus, single-platform constraints, and small scale. We introduce 3EED, a multi-platform, multi-modal 3D grounding benchmark featuring RGB and LiDAR data from vehicle, drone, and quadruped platforms. We provide over 128, 000 objects and 22, 000 validated referring expressions across diverse outdoor scenes -- 10x larger than existing datasets. We develop a scalable annotation pipeline combining vision-language model prompting with human verification to ensure high-quality spatial grounding. To support cross-platform learning, we propose platform-aware normalization and cross-modal alignment techniques, and establish benchmark protocols for in-domain and cross-platform evaluations. Our findings reveal significant performance gaps, highlighting the challenges and opportunities of generalizable 3D grounding. The 3EED dataset and benchmark toolkit are released to advance future research in language-driven 3D embodied perception.

AAAI Conference 2025 Conference Paper

ConSense: Continually Sensing Human Activity with WiFi via Growing and Picking

  • Rong Li
  • Tao Deng
  • Siwei Feng
  • Mingjie Sun
  • Juncheng Jia

WiFi-based human activity recognition (HAR) holds significant application potential across various fields. To handle dynamic environments where new activities are continuously introduced, WiFi-based HAR systems must adapt by learning new concepts without forgetting previously learned ones. Furthermore, retaining knowledge from old activities by storing historical exemplar is impractical for WiFi-based HAR due to privacy concerns and limited storage capacity of edge devices. In this work, we propose ConSense, a lightweight and fast-adapted exemplar-free class incremental learning framework for WiFi-based HAR. The framework leverages the transformer architecture and involves dynamic model expansion and selective retraining to preserve previously learned knowledge while integrating new information. Specifically, during incremental sessions, small-scale trainable parameters that are trained specifically on the data of each task are added in the multi-head self-attention layer. In addition, a selective retraining strategy that dynamically adjusts the weights in multilayer perceptron based on the performance stability of neurons across tasks is used. Rather than training the entire model, the proposed strategies of dynamic model expansion and selective retraining reduce the overall computational load while balancing stability on previous tasks and plasticity on new tasks. Evaluation results on three public WiFi datasets demonstrate that ConSense not only outperforms several competitive approaches but also requires fewer parameters, highlighting its practical utility in class-incremental scenarios for HAR.

AAAI Conference 2025 Conference Paper

Region-aware Difference Distilling with Attribute-guided Contrastive Regularization for Change Captioning

  • Rong Li
  • Liang Li
  • Jiehua Zhang
  • Qiang Zhao
  • Hongkui Wang
  • Chenggang Yan

Change captioning aims to describe the differences between two similar images using natural language, significantly aiding in understanding and monitoring changes. This challenging task requires a fine-grained understanding of subtle changes while resisting disturbances like viewpoint shifts and illumination variations. Existing methods often rely solely on global difference features and lack comprehensive alignment of linguistic and visual information, leading to overlooking fine-grained details and generating semantic hallucinated sentences. To address these limitations, we propose the region-aware difference distilling (RDD) network with attribute-guided contrastive regularization (ACR). The RDD uses global difference features to progressively distill regional difference features using learnable vectors, allowing for more precise identification of changed regions. The ACR enhances comprehensive alignment between linguistic and visual information by formulating Nouns-to-Objects (N2O) and Verbs-to-Actions (V2A) alignment losses to regularize the regional difference features. Promising results on three datasets demonstrate that our method outperforms the state-of-the-art change captioning methods.

AAAI Conference 2025 Conference Paper

Structure Balance and Gradient Matching-Based Signed Graph Condensation

  • Rong Li
  • Long Xu
  • Songbai Liu
  • Junkai Ji
  • Lingjie Li
  • Qiuzhen Lin
  • Lijia Ma

Training graph neural networks (GNNs) for graph representation has received increasing concerns due to its outstanding performance in the link prediction and node classification tasks, but it incurs much time and storage for tackling large-scale graphs. To alleviate this issue, graph condensation has been emerged to condense the large graph into a small but highly-informative graph, while achieving comparable performance of GNNs trained on the small graph and large graph. However, existing works mainly focus on the gradient or distribution matching under GNN training trajectories to condense simple link structures, while overlooking the structure matching for condensing signed graph that exists conflict links and structural balance among nodes. To bridge this gap, we propose a novel Structure Balance and Gradient Matching-Based Signed Graph Condensation (SGSGC) method for condensing signed graph with node attributes, conflict links and structural balance into informative smaller ones. Specifically, we first propose a structure-balanced matching to match the structural balance between the original and condensed signed graph, and then combine it with the gradient matching to condense signed graph for the link sign prediction task, while preserving both conflicting link structures and node attributes. Moreover, we use the feature smoothing and the graph sparsification technique to improve the robustness for the GNN training, respectively. Finally, a bi-level optimization technique is proposed to simultaneously find the optimal node attributes and conflict structure of the condensed graph. Experiments on six datasets demonstrate that SGSGC achieves excellent performance. On Epinions, 94% test accuracy of training on the original signed graph, while reducing their graph size by 99.95% - 99.99%, and there exist 2.24% – 6.26% accuracy improvements for link sign prediction compared to the state-of-the-arts.

NeurIPS Conference 2025 Conference Paper

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

  • Lingdong Kong
  • Dongyue Lu
  • Alan Liang
  • Rong Li
  • Yuhao Dong
  • Tianshuai Hu
  • Lai Xing Ng
  • Wei Ooi

Event cameras offer microsecond-level latency and robustness to motion blur, making them ideal for understanding dynamic environments. Yet, connecting these asynchronous streams to human language remains an open challenge. We introduce Talk2Event, the first large-scale benchmark for language-driven object grounding in event-based perception. Built from real-world driving data, Talk2Event provides over 30, 000 validated referring expressions, each enriched with four grounding attributes -- appearance, status, relation to viewer, and relation to other objects -- bridging spatial, temporal, and relational reasoning. To fully exploit these cues, we propose EventRefer, an attribute-aware grounding framework that dynamically fuses multi-attribute representations through a Mixture of Event-Attribute Experts (MoEE). Our method adapts to different modalities and scene dynamics, achieving consistent gains over state-of-the-art baselines in event-only, frame-only, and event-frame fusion settings. We hope our dataset and approach will establish a foundation for advancing multimodal, temporally-aware, and language-driven perception in real-world robotics and autonomy.