Arrow Research search

Author name cluster

Rui Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

73 papers
2 author rows

Possible papers

73

JBHI Journal 2026 Journal Article

A Novel Grasping Robot Control Method Using Motion Execution BCI Combining Knowledge Reasoning

  • Rui Li
  • Jing Liu
  • Jinli Liu
  • Shiqiang Yang
  • Weiping Liu
  • Ke Deng
  • Wen Wang

Recently, with the growing number of disabled people, brain-controlled technology offers a novel way to help patients restore their daily abilities. However, the conventional brain-controlled system based on the motion related task lacks intelligence in real-world environments. To address above problem, this study proposed a share-controlled system combining a precise hand movement (PHM)-based brain computer interface (BCI) system and knowledge-driven reasoning method. Six types of precise hand movements were selected to design novel motion execution paradigm for BCI system. A feature intermediate fusion convolutional neural network was employed to accurately decode electroencephalogram. Furthermore, a shared control grasping technology based on knowledge-based reasoning combined PHM-based BCI system was designed for grasping robot, which enhancing the system's intelligence and versatility in selecting objects. To verify the improvement of proposed method, experiments were conducted with 15 healthy subjects and 2 patients. The proposed method achieved an average accuracy of 82. 80 ± 6. 08%, with the highest accuracy reaching 94. 27%. All the experimental results demonstrate the effectiveness of the proposed shared control method.

AAAI Conference 2026 Conference Paper

Content Diversity-guided Ambiguity Mitigation for Open-Set Noisy Label Learning

  • Zhihao Zhou
  • Rui Li
  • Xueying Li

Open-set noisy label learning faces a critical challenge in maintaining robust DNN performance when training data contain both in-distribution noisy (IDN) and out-of-distribution (OOD) samples. These noisy samples induce overconfident but erroneous predictions due to their ambiguous positions relative to category boundaries. Current methods address this by filtering noisy samples based on visual features alone, they fail to resolve the semantic ambiguity near decision boundaries, where limited visual cues lead to unreliable sample purification. To this end, we propose Content Diversity-guided Ambiguity Mitigation (CDgAM), a novel framework that leverages diverse contents to mitigate visual ambiguity in open-set noisy label learning. CDgAM leverages textual descriptions of intra-class commonality and inter-class disparity to dynamically refine semantic boundaries, reducing bias in prototype learning. To further suppress early-stage uncertainty in visual representations, we design a region-sensitive distillation regularization that transfers boundary-aware knowledge from a multimodal large language model to the target DNN. Extensive experiments conducted on various datasets with different noise levels demonstrate the effectiveness of our CDgAM, outperforming state-of-the-art methods for open-set noisy label learning.

EAAI Journal 2026 Journal Article

Efficient transformer tracking with multi-axis mixed attention

  • Shaofeng Liang
  • Rui Li
  • Yingjing Shi
  • Zhi Deng

Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. However, the current Transformer trackers’ networks have huge parameters and heavy calculations, which limits their applications. To solve the above problems, in this paper, we propose a Multi-axis Mixed Attention Module (Max-MAM) for joint feature extraction and information integration of the target template and search region. Based on Max-MAM, we build a one-stage tracking pipeline called MaxTrack, stacking Max-MAM in the early stages with high input resolution and standard Transformer encoder layers in the third stage, and finally placing a fully convolutional localization head at the top of the backbone. In addition, to further reduce the model’s computational cost, we adopt an asymmetric cross-attention scheme and design an in-network adaptive background pruning based on the similarity prior in the cross-attention calculation. Our MaxTrack has been extensively experimented on multiple benchmarks to validate the effectiveness of our method. In particular, the AO (average overlap) score of our MaxTrack-B on GOT-10k reaches 67. 9%, and the success score on TrackingNet reaches 80. 7%, only in 31. 4M parameters and 13. 7G FLOPs (Floating Point Operations), achieving performance close to the current state-of-the-arts trackers, but significantly reducing computational burden. Our method achieves a good balance between accuracy and computational cost.

AAAI Conference 2026 Conference Paper

FedP²EFT: Federated Learning to Personalize PEFT for Multilingual LLMs

  • Royson Lee
  • Minyoung Kim
  • Fady Rezk
  • Rui Li
  • Stylianos I. Venieris
  • Timothy Hospedales

Federated learning (FL) has enabled training of multilingual large language models (LLMs) on diverse and decentralized multilingual data, especially on low-resource languages. To improve client-specific performance, personalization via the use of parameter-efficient fine-tuning (PEFT) modules such as LoRA is common. This involves a personalization strategy (PS), such as the design of the PEFT adapter structures (e.g., in which layers to add LoRAs and what ranks) and choice of hyperparameters (e.g., learning rates) for fine-tuning. Instead of manual PS configuration, we propose FedP²EFT, a federated learning-to-personalize method for multilingual LLMs in cross-device FL settings. Unlike most existing PEFT structure selection methods, which are prone to overfitting low-data regimes, FedP²EFT collaboratively learns the optimal personalized PEFT structure for each client via Bayesian sparse rank selection. Evaluations on both simulated and real-world multilingual FL benchmarks demonstrate that FedP²EFT largely outperforms existing personalized fine-tuning methods, while complementing other existing FL methods.

AAAI Conference 2026 Conference Paper

From Diagnosis to Generalization: A Cognitive Approach to Data Selection for Educational LLMs

  • Yuxiang Guo
  • Yan Zhuang
  • Qi Liu
  • Zhenya Huang
  • Xianquan Wang
  • Liyang He
  • Jiatong Li
  • Rui Li

Specializing Large Language Models for educational domains is a key frontier in creating personalized learning tools. The central challenge is not data scarcity but its abundance: efficiently selecting a curated data subset from vast corpora to enhance specialized skills and foster generalization, without degrading existing abilities. Existing data selection paradigms, relying on superficial semantic similarity or model training dynamics, often lack a principled framework to identify data that promotes true cognitive growth. Our work proposes a paradigm shift from leveraging indirect proxies of learning value, such as semantic similarity and training dynamics, towards a framework that performs a direct, cognitive-level modeling of the learner's state. We introduce CASS, a novel framework that implements this cognitive approach through a clear pipeline, moving from an initial Diagnosis to the ultimate goal of expanding the model's cognitive frontier. First, CASS diagnoses the LLM's cognitive frontier using Multidimensional Item Response Theory. Leveraging this diagnosis, it then employs Fisher Information to select a data subset situated at LLM's cognitive frontier that offers maximum informational gain. Finally, the model is fine-tuned on this curated data using a structured, easy-to-hard curriculum to ensure effective learning. Experiments on our new multi-subject dataset show that models trained with CASS not only achieve superior accuracy in the target domain but also exhibit enhanced generalization. CASS provides a more efficient, effective, and theoretically-grounded paradigm for building expert educational LLMs.

AAAI Conference 2026 Conference Paper

HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models

  • Young D. Kwon
  • Rui Li
  • Sijia Li
  • Da Li
  • Sourav Bhattacharya
  • Stylianos I. Venieris

State-of-the-art text-to-image diffusion models (DMs) achieve remarkable quality, yet their massive parameter scale (8-11B) poses significant challenges for inferences on resource-constrained devices. In this paper, we present HierarchicalPrune, a novel compression framework grounded in a key observation: DM blocks exhibit distinct functional hierarchies, where early blocks establish semantic structures while later blocks handle texture refinements. HierarchicalPrune synergistically combines three techniques: (1) Hierarchical Position Pruning, which identifies and removes less essential later blocks based on position hierarchy; (2) Positional Weight Preservation, which systematically protects early model portions that are essential for semantic structural integrity; and (3) Sensitivity-Guided Distillation, which adjusts knowledge-transfer intensity based on our discovery of block-wise sensitivity variations. As a result, our framework brings billion-scale diffusion models into a range more suitable for on-device inference, while preserving the quality of the output images. Specifically, combined with INT4 weight quantisation, HierarchicalPrune achieves 77.5-80.4% memory footprint reduction (e.g., from 15.8 GB to 3.2 GB) and 27.9-38.0% latency reduction, measured on server and consumer grade GPUs, with the minimum drop of 2.6% in GenEval score and 7% in HPSv2 score compared to the original model. Finally, our comprehensive user study with 85 participants demonstrates that HierarchicalPrune maintains perceptual quality comparable to the original model while significantly outperforming prior works.

YNIMG Journal 2026 Journal Article

Hippocampal-parietal directed connectivity mediates brain network reconfiguration between internal and external attention: An intracranial EEG study

  • Lizhi Yang
  • Huimin Huang
  • Xiaojun Qiao
  • ShengTeng Ong
  • Xiaoran Li
  • Ziyue Li
  • Siyi Chen
  • Huiqing Jia

Attention is a cornerstone of cognitive function, and understanding its neural mechanisms is of great significance for both cognitive science and clinical applications. A critical aspect of this endeavor involves elucidating how the brain's network architecture shifts between internally- and externally-directed states. However, the distinct organizational principles of neural networks in these states, as well as the pivotal brain regions and connections that mediate such transitions, remain largely unclear. To investigate these network dynamics, this study analyzed stereo-electroencephalography (SEEG) data from 17 patients with refractory epilepsy performing a modified gradual-onset continuous performance task (gradCPT) designed to induce distinct internal and external attention states. High-frequency broadband (HFB, 70-170 Hz) signals were extracted as indicators of neural activity, and neural Granger causality analysis was employed to construct effective connectivity networks between brain regions. For the effective connectivity networks, we systematically applied modular analysis to quantify network segregation, node role classification to identify hub regions, and machine learning methods to evaluate the discriminative power of the identified connectivity differences. The results showed that the external attention state exhibited significantly stronger global causal connectivity and a topological profile dominated by connector hubs. In contrast, the internal attention state displayed higher modularity and a prevalence of peripheral nodes, reflecting a segregated network architecture. Eight pairs of brain region connections showed significant differences between the two states, primarily involving the parietal-temporal network. A support vector machine (SVM) classifier achieved 77.8% accuracy in distinguishing attention states under cross-subject conditions using the identified directed connectivity features, demonstrating the discriminative power of network differences. Feature importance analysis identified the intrinsic dynamics of the hippocampus (HIP) and its directed outflow to the middle temporal gyrus (MTG) as the most significant discriminative features. Consequently, the hippocampus operates in concert with temporal and parietal regions to mediate these transitions, suggesting that flexible cognitive control depends on the dynamic coupling between memory systems and cortical networks. These results provide potential neural biomarkers for attention-related disorders and advance our mechanistic understanding of how the brain adaptively organizes information flow to meet varying cognitive demands.

AAAI Conference 2026 Conference Paper

Large Language Models Struggle with Unreasonability in Math Problems

  • Jingyuan Ma
  • Damai Dai
  • Zihang Yuan
  • Rui Li
  • Weilin Luo
  • Bin Wang
  • Qun Liu
  • Lei Sha

Large Language Models (LLMs) have shown remarkable success on a wide range of math and reasoning benchmarks. However, we observe that they often struggle when faced with unreasonable math problems. Instead of recognizing these issues, models frequently proceed as if the problem is well-posed, producing incorrect answers or falling into overthinking and verbose self-correction. To systematically investigate this overlooked vulnerability, we propose the Unreasonable Math Problems (UMP) benchmark, designed to evaluate LLMs' ability to detect and respond to unreasonable math problem statements. Based on extensive experiments covering 19 LLMs, we find that even state-of-the-art general models like GPT-4o struggle on UMP. While reasoning models such as DeepSeek-R1 demonstrate a higher sensitivity to unreasonable inputs, this often comes at the cost of generating overly long and meaningless responses that fail to converge. We further find that prompting and fine-tuning enhance the detection of unreasonable inputs, with minor and acceptable trade-offs, making them practical solutions in this challenging setting.

JBHI Journal 2026 Journal Article

LELN: A Large Language Model-Dynamically Enhanced Learning Network for Patient Similarity Calculation

  • Zhichao Zhu
  • Bo Bai
  • Jianqiang Li
  • Han Wang
  • Rui Li
  • Lan Lan

The rapid expansion of Electronic Medical Record (EMR) data has advanced AI-driven patient similarity computation, a key technology for intelligent healthcare. However, the handling of heterogeneous EMR formats and the integration of domain knowledge constrain existing methods. While graph-based approaches show promise, they still struggle with these issues. To address this, we propose a Large Language Model-Dynamically Enhanced Learning Network (LELN), leveraging LLMs' commonsense knowledge and reasoning to dynamically structure EMR data and enhance medical knowledge integration. LELN in tegrates two LLM-basedmodules: DS-EE(DeepSeek-Event Extraction) extracts medical events to construct structured EMR event graphs, and DS-KB (DeepSeek-Knowledge Base) infers disease-relevant knowledge to augment feature representations. The model employs a dual-stage spatial-temporal feature aggregation strategy: a Graph Attention Network captures intra- and inter-event dependencies, followed by a Bidirectional Long-Short Term Memory (BiLSTM) with attention to model temporal disease progression. Additionally, a clinical prior-guided attention mechanism emphasizes discriminative diagnostic features, improving clinical relevance. Extensive experiments on heterogeneous datasets—a real-world Chinese dataset and public MIMIC-III—show LELN outperforms baselines, achieving F1 scores of 87. 66% and 85. 95%, demonstrating robustness and accuracy.

AAAI Conference 2026 Conference Paper

LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning

  • Liutao
  • Xutao Mao
  • Dixuan Zhang
  • Yifan Li
  • LiuHaixin
  • KongLulu
  • Jiaming Hou
  • Rui Li

Text-to-SQL is a critical task in natural language processing that aims to transform natural language questions into accurate and executable SQL queries. In real-world scenarios, these reasoning tasks are often accompanied by complex mathematical computations, domain knowledge, and hypothetical reasoning scenarios. However, existing large-scale Text-to-SQL datasets typically focus on business logic and task logic, neglecting critical factors such as vertical domain knowledge, complex mathematical reasoning, and hypothetical reasoning, which are essential for realistically reflecting the reasoning demands in practical applications and completing data querying and analysis. To bridge this gap, we introduce LogicCat, the first Text-to-SQL benchmark dataset specifically designed for complex reasoning and chain-of-thought parsing, encompassing physics, arithmetic, commonsense, and hypothetical reasoning scenarios. LogicCat comprises 4,038 English questions paired 12,114 detailed chain-of-thought reasoning steps, spanning 45 databases across diverse domains, significantly surpassing existing datasets in complexity. Experimental results demonstrate that LogicCat substantially increases the task difficulty for current state-of-the-art models to at most 33.20% execution accuracy, indicating that this task remains exceptionally challenging. The advancement of LogicCat represents a crucial step toward developing systems suitable for real-world enterprise data analysis and autonomous query generation.

JBHI Journal 2026 Journal Article

Self-Supervised Contrastive Learning With Attention Fusion for Enhanced Breast Cancer Diagnosis From Mammography

  • Xiaohong Lyu
  • Liang Dong
  • Sihan Wang
  • Rui Li
  • Yanhong Feng

Screening mammography presents complementary craniocaudal and mediolateral oblique views whose joint interpretation hinges on view-invariance for the same breast and sensitivity to contralateral asymmetry. We propose a self-supervised anatomy-aware with attention fusion framework (SCL-AF) that couples contrastive pretraining with cross-view positives and contralateral hard negatives, a lesion-guided tokenization that distills high-resolution images into a compact set of clinically meaningful tokens, and a geometry-biased, bidirectional attention fusion that reconciles evidence across views. Supervised fine-tuning uses a class-imbalance–aware objective together with view consistency and contralateral symmetry regularizers. Evaluated on the public CBIS-DDSM dataset, SCL-AF achieves ROC-AUC 0. 942, PR-AUC 0. 692, and SEN 0. 631, which outperform strong baselines. Gains concentrate in the clinically relevant high-specificity regime with particularly large improvements on calcification-dominant breasts. Ablations show that removing cross-view positives or contralateral negatives substantially degrades high-specificity sensitivity and calibration, lesion-guided tokens with diversity priors outperform global or randomly sampled tokens, and two layers of bidirectional attention offer the best accuracy and latency trade-off. These results suggest that encoding mammographic anatomy directly into representation learning and fusion yields significant improvements at operating points suitable for screening triage.

TIST Journal 2025 Journal Article

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

  • Fali Wang
  • Zhiwei Zhang
  • Xianren Zhang
  • Zongyu Wu
  • TzuHao Mo
  • Qiuhao Lu
  • Wanjing Wang
  • Rui Li

Large language models (LLMs) have demonstrated emergent abilities in text generation, question answering, and reasoning, facilitating various tasks and domains. Despite their proficiency in various tasks, LLMs like PaLM 540B and Llama-3.1 405B face limitations due to large parameter sizes and computational demands, often requiring cloud API use, which raises privacy concerns, limits real-time applications on edge devices, and increases fine-tuning costs. Additionally, LLMs often underperform in specialized domains such as healthcare and law due to insufficient domain-specific knowledge, necessitating specialized models. Therefore, Small Language Models (SLMs) are increasingly favored for their low inference latency, cost-effectiveness, efficient development, and easy customization and adaptability. These models are particularly well-suited for resource-limited environments and domain knowledge acquisition, addressing LLMs’ challenges and proving ideal for applications that require localized data handling for privacy, minimal inference latency for efficiency, and domain knowledge acquisition through lightweight fine-tuning. The rising demand for SLMs has spurred extensive research and development. However, a comprehensive survey investigating issues related to the definition, acquisition, application, enhancement, and reliability of SLM remains lacking, prompting us to conduct a detailed survey on these topics. The definition of SLMs varies widely; thus, to standardize, we propose defining SLMs by their capability to perform specialized tasks and suitability for resource-constrained settings, setting boundaries based on the minimal size for emergent abilities and the maximum size sustainable under resource constraints. For other aspects, we provide a taxonomy of relevant models/methods and develop general frameworks for each category to enhance and utilize SLMs effectively. We have compiled the collected SLM models and related methods on GitHub: https://github.com/FairyFali/SLMs-Survey.

EAAI Journal 2025 Journal Article

A dual-branch convolutional neural network with domain-informed attention for arrhythmia classification of 12-lead electrocardiograms

  • Rucheng Jiang
  • Bin Fu
  • Renfa Li
  • Rui Li
  • Danny Z. Chen
  • Yan Liu
  • Guoqi Xie
  • Keqin Li

The automatic classification of arrhythmia is an important task in the intelligent auxiliary diagnosis of an electrocardiogram. Its efficiency and accuracy are vital for practical deployment and applications in the medical field. For the 12-lead electrocardiogram, we know that the comprehensive utilization of lead characteristics is key to enhancing diagnostic accuracy. However, existing classification methods (1) neglect the similarities and differences between the limb lead group and the precordial lead group; (2) the commonly adopted attention mechanisms struggle to capture the domain characteristics in an electrocardiogram. To address these issues, we propose a new dual-branch convolutional neural network with domain-informed attention, which is novel in two ways. First, it adopts a dual-branch network to extract intra-group similarities and inter-group differences of limb and precordial leads. Second, it proposes a domain-informed attention mechanism to embed the critical domain knowledge of electrocardiogram, multiple RR (R wave to R wave) intervals, into coordinated attention to adaptively assign attention weights to key segments, thereby effectively capturing the characteristics of the electrocardiogram domain. Experimental results show that our method achieves an F1-score of 0. 905 and a macro area under the curve of 0. 936 on two widely used large-scale datasets, respectively. Compared to state-of-the-art methods, our method shows significant performance improvements with a drastic reduction in model parameters.

EAAI Journal 2025 Journal Article

A non-negative garrote shrinkage network with adaptive Swish for rotating machinery fault diagnosis under noisy environment

  • Pengcheng Zhong
  • Zhenyu Liu
  • Rui Li
  • Hui Liu
  • Xiaoqi Yang
  • Zihan Dong
  • Jianrong Tan

The strong noise existing in the vibration signals has a negative impact on rotating machinery fault diagnosis. To solve the noise problem in the engineering applications of fault diagnosis, a deep residual network, named non-negative garrote shrinkage network with adaptive Swish (NNGSN-AS), is proposed for rotating machinery fault diagnosis under noisy environment. In the NNGSN-AS, the non-negative garrote shrinkage function (NNGSF) is integrated into residual building blocks as nonlinear transformation layers, and the residual building block is named the non-negative garrote shrinkage building unit (NNGSBU). In the NNGSBU, the threshold of the NNGSF is adaptively learned by the thresholding module, so that different thresholds can be assigned to different data samples. The thresholding module is close to the NNGSBU input, enabling early noise handling. The depthwise convolutions with wide kernels in the thresholding module increase the receptive field and lead to a one-to-one correspondence between the learnable threshold and the input feature elements of the NNGSBU, reducing the negative influence of the noise. Additionally, an adaptive Swish (ASwish) activation function module is developed, enabling adaptive nonlinear transformation of each feature channel. The experimental results on a public dataset and our laboratory dataset indicate that the NNGSN-AS is superior to the existing methods for rotating machinery fault diagnosis under noisy environment. Given that the performance gain of the NNGSN-AS stems from multiple components for deep feature extraction, the ablation experiments are conducted to demonstrate the improvement effect of each component.

TMLR Journal 2025 Journal Article

Bayesian Neighborhood Adaptation for Graph Neural Networks

  • Paribesh Regmi
  • Rui Li
  • Kishan K C

The neighborhood scope (i.e., number of hops) where graph neural networks (GNNs) aggregate information to characterize a node's statistical property is critical to GNNs' performance. Two-stage approaches, training and validating GNNs for every pre-specified neighborhood scope to search for the best setting, is a time-consuming task and tends to be biased due to the search space design. How to adaptively determine proper neighborhood scopes for the aggregation process for both homophilic and heterophilic graphs remains largely unexplored. We thus propose to model the GNNs' message-passing behavior on a graph as a stochastic process by treating the number of hops as a beta process. This Bayesian framework allows us to infer the most plausible neighborhood scope for message aggregation simultaneously with the optimization of GNN parameters. Our theoretical analysis shows that the scope inference improves the expressivity of a GNN. Experiments on benchmark homophilic and heterophilic datasets show that the proposed method is compatible with state-of-the-art GNN variants, achieving competitive or superior performance on the node classification task, and providing well-calibrated predictions.

NeurIPS Conference 2025 Conference Paper

CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension

  • Rui Li
  • Zeyu Zhang
  • Xiaohe Bo
  • Zihang Tian
  • Xu Chen
  • Quanyu Dai
  • Zhenhua Dong
  • Ruiming Tang

Current Large Language Models (LLMs) are confronted with overwhelming information volume when comprehending long-form documents. This challenge raises the imperative of a cohesive memory module, which can elevate vanilla LLMs into autonomous reading agents. Despite the emergence of some heuristic approaches, a systematic design principle remains absent. To fill this void, we draw inspiration from Jean Piaget's Constructivist Theory, illuminating three traits of the agentic memory---structured schemata, flexible assimilation, and dynamic accommodation. This blueprint forges a clear path toward a more robust and efficient memory system for LLM-based reading comprehension. To this end, we develop CAM, a prototype implementation of Constructivist Agentic Memory that simultaneously embodies the structurality, flexibility, and dynamicity. At its core, CAM is endowed with an incremental overlapping clustering algorithm for structured memory development, supporting both coherent hierarchical summarization and online batch integration. During inference, CAM adaptively explores the memory structure to activate query-relevant information for contextual response, akin to the human associative process. Compared to existing approaches, our design demonstrates dual advantages in both performance and efficiency across diverse long-text reading comprehension tasks, including question answering, query-based summarization, and claim verification.

AAAI Conference 2025 Conference Paper

Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship

  • Junfeng Kang
  • Rui Li
  • Qi Liu
  • Zhenya Huang
  • Zheng Zhang
  • Yanjiang Chen
  • Linbo Zhu
  • Yu Su

Dense retrieval has emerged as the leading approach in information retrieval, aiming to find semantically relevant documents based on natural language queries. Given that a single document can be retrieved by multiple distinct queries, existing methods aim to represent a document with multiple vectors. Each vector is aligned with a different query to model the many-to-one relationship between queries and documents. However, these multiple vector-based approaches encounter challenges such as Increased Storage, Vector Collapse, and Search Efficiency. To address these issues, we introduce the Distribution-Driven Dense Retrieval framework (DDR). Specifically, we use vectors to represent queries and distributions to represent documents. This approach not only captures the relationships between multiple queries corresponding to the same document but also avoids the need to use multiple vectors to represent the document. Furthermore, to ensure search efficiency for DDR, we propose a dot product-based computation method to calculate the similarity between documents represented by distributions and queries represented by vectors. This allows for seamless integration with existing approximate nearest neighbor (ANN) search algorithms for efficient search. Finally, we conduct extensive experiments on real-world datasets, which demonstrate that our method significantly outperforms traditional dense retrieval methods.

YNIMG Journal 2025 Journal Article

Expertise-related functional connectivity changes in Chinese calligraphy linked to flow experience

  • Qingyan Kong
  • Yue Wang
  • Min Li
  • Buxin Han
  • Rui Li

Flow is a deeply immersive state that supports optimal performance, yet its neural basis under conditions of real-world expertise remains poorly understood. Using functional MRI, this study investigated how long-term Chinese calligraphy expertise relates to flow in a culturally meaningful setting. Expert and novice participants performed imagined embodied handwriting of Kai-Shu and Cao-Shu, which differ in motor and cognitive challenges. Expert calligraphers reported significantly higher flow than novices across both scripts, including in the more challenging Cao-Shu style despite having no formal training in it. Functional connectivity analyses were performed on background task-residual BOLD signals to assess intrinsic coupling that persists during performance. In Kai-Shu, experts showed stronger ventral anterior insula (vAI)-superior parietal lobule (SPL) connectivity and weaker vAI-ventral striatum (VS) connectivity, suggesting enhanced perception-action coupling and reduced task-irrelevant processing. In Cao-Shu, experts exhibited reduced anterior medial prefrontal cortex (aMPFC) connectivity with default mode network (DMN) regions, suggesting reduced self-referential processing under higher task challenges. These connectivity patterns were significantly associated with reported flow ratings and together suggest a flexible neural adaptation supporting task-focused engagement in familiar contexts and reduced introspection when demands increase. To further examine whether these effects form an integrated mechanism linking proficiency and flow, Bayesian network (BN) modeling revealed a directional dependency from expertise to functional connectivity to flow, suggesting that long-term practice contributes to a proficient neural mechanism that supports higher flow experiences during task engagement. These findings extend current accounts of flow by delineating how sustained expertise is associated with neural processing patterns that are linked to higher flow across varying task challenges.

ICML Conference 2025 Conference Paper

Harnessing Heterogeneous Statistical Strength for Personalized Federated Learning via Hierarchical Bayesian Inference

  • Mahendra Singh Thapa
  • Rui Li

Personalized federated learning (PFL) based on Bayesian approach tackle the challenges from statistical heterogeneity of client data by computing a personalized posterior distribution over the parameters of each client’s local model and constructing a global distribution by aggregating the parameters of these personalized posteriors. However, the heuristic aggregation methods introduce strong biases and result in global models with poor generalization. We thus propose a novel hierarchical Bayesian inference framework for PFL by specifying a conjugate hyper-prior over the parameters of the personalized posteriors. This allows us to jointly compute a global posterior distribution for aggregation and the personalized ones at local level. This hierarchical Bayesian inference framework achieves elegant balance between local personalization and global model robustness. Extensive empirical study shows that by effectively sharing the heterogeneous statistical strength across the local models while retaining their distinctive characteristics, our framework yields state-of-the-art performance. We also show that existing Bayesian PFLs are special cases of our framework.

NeurIPS Conference 2025 Conference Paper

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

  • Rui Li
  • Zixuan Hu
  • Wenxi Qu
  • Jinouwen Zhang
  • Zhenfei Yin
  • Sha Zhang
  • Xuantuo Huang
  • Hanqing Wang

Scientific embodied agents play a crucial role in modern laboratories by automating complex experimental workflows. Compared to typical household environments, laboratory settings impose significantly higher demands on perception of physical-chemical transformations and long-horizon planning, making them an ideal testbed for advancing embodied intelligence. However, its development has been long hampered by the lack of suitable simulator and benchmarks. In this paper, we address this gap by introducing LabUtopia, a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents in laboratory settings. Specifically, it integrates i) LabSim, a high-fidelity simulator supporting multi-physics and chemically meaningful interactions; ii) LabScene, a scalable procedural generator for diverse scientific scenes; and iii) LabBench, a hierarchical benchmark spanning five levels of complexity from atomic actions to long-horizon mobile manipulation. LabUtopia supports 30 distinct tasks and includes more than 200 scene and instrument assets, enabling large-scale training and principled evaluation in high-complexity environments. We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents and provides a rigorous testbed for exploring the practical capabilities and generalization limits of embodied intelligence in future research. Project web page: https: //rui-li023. github. io/labutopia-site/

YNIMG Journal 2025 Journal Article

LUMEN–A deep learning pipeline for analysis of the 3D morphology of the cerebral lenticulostriate arteries from time-of-flight 7T MRI

  • Rui Li
  • Soumick Chatterjee
  • Yeerfan Jiaerken
  • Xia Zhou
  • Chethan Radhakrishna
  • Philip Benjamin
  • Stefania Nannoni
  • Daniel J. Tozer

The lenticulostriate arteries (LSAs) supply critical subcortical brain structures and are affected in cerebral small vessel disease (CSVD). Changes in their morphology are linked to cardiovascular risk factors and may indicate early pathology. 7T Time-of-Flight MR angiography (TOF-MRA) enables clear LSA visualisation. We aimed to develop a semi-automated pipeline for quantifying 3D LSA morphology from 7T TOF-MRA in CSVD patients. We used data from a local 7T CSVD study to create a pipeline, LUMEN, comprising two stages: vessel segmentation and LSA quantification. For segmentation, we fine-tuned a deep learning model, DS6, and compared it against nnU-Net and a Frangi-filter pipeline, MSFDF. For quantification, centrelines of LSAs within basal ganglia were extracted to compute branch counts, length, tortuosity, and maximum curvature. This pipeline was applied to 69 subjects, with results compared to traditional analysis measuring LSA morphology on 2D coronal maximum intensity projection (MIP) images. For vessel segmentation, fine-tuned DS6 achieved the highest test Dice score (0.814±0.029) and sensitivity, whereas nnU-Net achieved the best balanced average Hausdorff distance and precision. Visual inspection confirmed that DS6 was most sensitive in detecting LSAs with weak signals. Across 69 subjects, the pipeline with DS6 identified 23.5 ± 8.5 LSA branches. Branch length inside the basal ganglia was 26.4 ± 3.5 mm, and tortuosity was 1.5 ± 0.1. Extracted LSA metrics from 2D MIP analysis and our 3D analysis showed fair-to-moderate correlations. Outliers highlighted the added value of 3D analysis. This open-source deep-learning-based pipeline offers a validated tool quantifying 3D LSA morphology in CSVD patients from 7T-TOF-MRA for clinical research.

NeurIPS Conference 2025 Conference Paper

MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants

  • Zeyu Zhang
  • Quanyu Dai
  • Luyu Chen
  • Zeren Jiang
  • Rui Li
  • Jieming Zhu
  • Xu Chen
  • Yi Xie

LLM-based agents have been widely applied as personal assistants, capable of memorizing information from user messages and responding to personal queries. However, there still lacks an objective and automatic evaluation on their memory capability, largely due to the challenges in constructing reliable questions and answers (QAs) according to user messages. In this paper, we propose MemSim, a Bayesian simulator designed to automatically construct reliable QAs from generated user messages, simultaneously keeping their diversity and scalability. Specifically, we introduce the Bayesian Relation Network (BRNet) and a causal generation mechanism to mitigate the impact of LLM hallucinations on factual information, facilitating the automatic creation of an evaluation dataset. Based on MemSim, we generate a dataset in the daily-life scenario, named MemDaily, and conduct extensive experiments to assess the effectiveness of our approach. We also provide a benchmark for evaluating different memory mechanisms in LLM-based agents with the MemDaily dataset.

IJCAI Conference 2025 Conference Paper

Progressive Prefix-Memory Tuning for Complex Logical Query Answering on Knowledge Graphs

  • Xingrui Zhuo
  • Shirui Pan
  • Jiapu Wang
  • Gongqing Wu
  • Zan Zhang
  • Rui Li
  • Zizhong Wei
  • Xindong Wu

Conducting complex logical queries over knowledge graphs remains a significant challenge. Recent research has successfully leveraged Pre-trained Language Models (PLMs) to tackle Knowledge Graph Complex Query Answering (KGCQA) tasks, which is attributed to PLMs' ability to comprehend logical semantics of queries through context learning. However, existing PLM-based KGCQA methods usually overlook the harm of disordered syntax or fragmented contexts within a serialized query, posing the problem of “impossible language” to limit PLMs in grasping the logical semantics. To address this problem, we propose a Progressive Prefix-Memory Tuning (PPMT) framework for KGCQA tasks, which effectively rectifies erroneous segments in serialized queries to assist PLMs in query answering. First, we propose a prefix-memory rectification mechanism embedded in a PLM module. This mechanism assigns rectification parameters in memory stores to polish the language segments of entities, relations, and queries through specific prefixes. To further capture the logical semantics in queries, we design a progressive fine-tuning strategy, which optimizes our model through a conditional gradient update process guided by knowledge translation constraints. Extensive experiments on widely used KGCQA benchmarks demonstrate the significant superiority of PPMT in terms of HR@3 and MRR. Our codes are available at https: //github. com/lazyloafer/PPMT.

EAAI Journal 2025 Journal Article

Real-time and explainable rock mass classification under imbalanced tunnel boring machine data using hybrid resampling and ensemble learning

  • Rui Li
  • Junlong Yan
  • Yueji He
  • Shaoxuan Guo
  • Qingsong Zhang
  • Rentai Liu
  • Yanyi Liu
  • Xuanyue Feng

The construction safety and efficiency of Tunnel Boring Machines (TBMs) are highly dependent on the accurate identification of surrounding rock mass grades. This study develops a data-driven rock mass prediction model using tunneling parameters collected from the Yinchao Jiliao diversion project. Mutual information coefficient, spearman correlation analysis, and kernel density estimation were comprehensively applied to identify the most relevant statistical features derived from key tunneling parameters that are associated with surrounding rock classes. Seven individual models and three ensemble learning models were established, with hyperparameters optimized via a Tree-structured Parzen Estimator (TPE) based Bayesian algorithm and stratified five-fold cross-validation. To address the core challenges of highly imbalanced sample distribution and inter-class feature overlap, this study introduced Synthetic Minority Over-sampling Technique (SMOTE) and SMOTE-Tomek for data preprocessing. Considering the asymmetric risk associated with misclassification of different rock mass grades in practical tunneling engineering, a risk preference metric termed High-Risk Average Recall (HRAR) was proposed to evaluate model, prioritizing the prevention of misclassifying high-risk rock masses (Class IV and V) as low-risk rock masses (Class II and III). Based on comprehensive metrics, the SMOTE-Tomek-preprocessed Soft-Voting ensemble model achieved superior macro-average performance and high HRAR value. To enhance model transparency and credibility, SHapley Additive exPlanations (SHAP) was employed for explainability analysis. This method elucidated the contribution and influence of key features (thrust, torque, advance rate) on rock mass classification across different models. This study provides a systematic solution and technical foundation for geological perception and risk early-warning in intelligent TBM tunneling.

NeurIPS Conference 2025 Conference Paper

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

  • Xeron Du
  • Yifan Yao
  • Kaijing Ma
  • Bingli Wang
  • Tianyu Zheng
  • Minghao Liu
  • Yiming Liang
  • Xiaolong Jin

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e. g. , the reasoning-focused model Gemini-2. 5-Pro achieved the highest accuracy of 63. 56% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

NeurIPS Conference 2025 Conference Paper

Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach

  • Dandan Liang
  • Jianing Zhang
  • Evan Chen
  • Zhe Li
  • Rui Li
  • Haibo Yang

Split Federated Learning (SFL) enables scalable training on edge devices by combining the parallelism of Federated Learning (FL) with the computational offloading of Split Learning (SL). Despite its great success, SFL suffers significantly from the well-known straggler issue in distributed learning systems. This problem is exacerbated by the dependency between Split Server and clients: the Split Server side model update relies on receiving activations from clients. Such synchronization requirement introduces significant time latency, making straggler a critical bottleneck to the scalability and efficiency of the system. To mitigate this problem, we propose *MU-SplitFed*, a straggler-resilient SFL algorithm that decouples training progress from straggler delays via a simple yet effective unbalanced update mechanism. By enabling the server to perform $\tau$ local updates per client round, *MU-SplitFed* achieves convergence rate $\mathcal{O}(\sqrt{d/(\tau T)})$, showing a linear reduction in communication round by a factor of $\tau$. Experiments demonstrate that *MU-SplitFed* consistently outperforms baseline methods with the presence of stragglers and effectively mitigates their impact through adaptive tuning of $\tau$.

EAAI Journal 2025 Journal Article

U-bilateral attention gate nested U-transformers for medical image segmentation

  • Wenkang Fan
  • Haichao Peng
  • Rui Li
  • Yong Peng
  • Jie Luo
  • Xiongbiao Luo

Automatic medical image segmentation facilitates accurate computer-aided diagnosis and computer-assisted surgery. Current segmentation approaches, predominantly based on convolutional neural networks, often struggle with capturing long-range or global features effectively. We propose a compact yet powerful deep learning model, the U-bilateral attention gate (U-BAG) nested U-transformers (UnUFormer), specifically designed for abdominal image segmentation. In this model, we introduce the U-bilateral attention gate module, which combines an encoder–decoder U-structure with a bilateral attention gate. Additionally, we develop a novel convolutional transformer module. This module replaces the conventional multihead self-attention (token mixer) with one-dimensional convolution for more efficient feature fusion. This convolutional transformer module, working in tandem with skip connections at multiple stages, adeptly identifies both global and local features. Crucially, it also reduces the computational overhead associated with multihead self-attention. We conducted evaluations of our proposed method on two publicly available clinical datasets. The experimental results demonstrate that UnUFormer significantly outperforms other network methods. It notably improves the average dice similarity coefficient and average symmetric surface distance in abdominal multiorgan segmentation from (94. 5%, 0. 49) to (95. 2%, 0. 36).

AAAI Conference 2025 Conference Paper

VERSE: Verification-based Self-Play for Code Instructions

  • Hao Jiang
  • Qi Liu
  • Rui Li
  • Yuze Zhao
  • Yixiao Ma
  • Shengyu Ye
  • Junyu Lu
  • Yu Su

Instruction-tuned Code Large Language Models (Code LLMs) have excelled in diverse code-related tasks, such as program synthesis, automatic program repair, and code explanation. To collect training datasets for instruction-tuning, a popular method involves having models autonomously generate instructions and corresponding responses. However, the direct generation of responses does not ensure functional correctness, a crucial requirement for generating responses to code instructions. To overcome this, we present Verification-Based Self-Play (VERSE), aiming to enhance model proficiency in generating correct responses. VERSE establishes a robust verification framework that covers various code instructions. Employing VERSE, Code LLMs engage in self-play to generate instructions and corresponding verifications. They evaluate execution results and self-consistency as verification outcomes, using them as scores to rank generated data for self-training. Experiments show that VERSE improves multiple base Code LLMs (average 7.6%) across various languages and tasks on many benchmarks, affirming its effectiveness.

ICML Conference 2024 Conference Paper

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

  • Jiaqi Zhai
  • Lucy Liao
  • Xing Liu
  • Yueming Wang
  • Rui Li
  • Xuan Cao
  • Leon Gao
  • Zhaojie Gong

Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in language and vision domains, we revisit fundamental design choices in recommendation systems. We reformulate recommendation problems as sequential transduction tasks within a generative modeling framework (“Generative Recommenders”), and propose a new architecture, HSTU, designed for high cardinality, non-stationary streaming recommendation data. HSTU outperforms baselines over synthetic and public datasets by up to 65. 8% in NDCG, and is 5. 3x to 15. 2x faster than FlashAttention2-based Transformers on 8192 length sequences. HSTU-based Generative Recommenders, with 1. 5 trillion parameters, improve metrics in online A/B tests by 12. 4% and have been deployed on multiple surfaces of a large internet platform with billions of users. More importantly, the model quality of Generative Recommenders empirically scales as a power-law of training compute across three orders of magnitude, up to GPT-3/LLaMa-2 scale, which reduces carbon footprint needed for future model developments, and further paves the way for the first foundation models in recommendations.

NeurIPS Conference 2024 Conference Paper

APDDv2: Aesthetics of Paintings and Drawings Dataset with Artist Labeled Scores and Comments

  • Xin Jin
  • Qianqian Qiao
  • Yi Lu
  • Huaye Wang
  • Heng Huang
  • Shan Gao
  • Jianfei Liu
  • Rui Li

Datasets play a pivotal role in training visual models, facilitating the development of abstract understandings of visual features through diverse image samples and multidimensional attributes. However, in the realm of aesthetic evaluation of artistic images, datasets remain relatively scarce. Existing painting datasets are often characterized by limited scoring dimensions and insufficient annotations, thereby constraining the advancement and application of automatic aesthetic evaluation methods in the domain of painting. To bridge this gap, we introduce the Aesthetics Paintings and Drawings Dataset (APDD), the first comprehensive collection of paintings encompassing 24 distinct artistic categories and 10 aesthetic attributes. Building upon the initial release of APDDv1, our ongoing research has identified opportunities for enhancement in data scale and annotation precision. Consequently, APDDv2 boasts an expanded image corpus and improved annotation quality, featuring detailed language comments to better cater to the needs of both researchers and practitioners seeking high-quality painting datasets. Furthermore, we present an updated version of the Art Assessment Network for Specific Painting Styles, denoted as ArtCLIP. Experimental validation demonstrates the superior performance of this revised model in the realm of aesthetic evaluation, surpassing its predecessor in accuracy and efficacy. The dataset and model are available at https: //github. com/BestiVictory/APDDv2. git.

EAAI Journal 2024 Journal Article

Application of physical-structure-driven deep learning and compensation methods in aircraft engine health management

  • Dasheng Xiao
  • Hong Xiao
  • Rui Li
  • Zhanxue Wang

The operational well-being of aircraft-engine turbine components is paramount for engine safety. Monitoring exhaust gas temperature (EGT) serves as a key indicator of their condition. Real-time and precise EGT prediction in aircraft engines plays a pivotal role in ensuring flight safety and effective engine health management. A deep learning model based on the long short-term memory (LSTM), integrated with the physical topology of the aircraft engine, served as the basic prediction model for EGT. Based on Taylor expansion, error compensation was performed on model errors arising from sensor error and performance degradation, three compensation models were developed: a Base model consistent with the basic prediction model, an LSTM model, and a multilayer perceptron model. Their influence on prediction accuracy was examined. Each compensation model was trained using two distinct fusion methods: a global compensation method (GCM) and a real-time compensation method (RtCM). The research also explored how different fusion methods contributed to enhancing prediction accuracy. The results showed that the basic model used in this study achieved high prediction precision. The addition of the compensation model further improved the prediction precision. The GCM effectively reduced the mean absolute relative error (MARE), while the RtCM effectively reduced the maximum absolute relative error (Emax) without increasing the prediction time. The best model evaluated on the four engine test datasets had a MARE value of 0. 166%, Emax value of 2. 745% and mean absolute error of 1. 13 ° C, indicating high prediction precision.

ICLR Conference 2024 Conference Paper

Are Human-generated Demonstrations Necessary for In-context Learning?

  • Rui Li
  • Guoyin Wang 0002
  • Jiwei Li 0001

Despite the promising few-shot ability of large language models (LLMs), the standard paradigm of In-context Learning (ICL) suffers the disadvantages of susceptibility to selected demonstrations and the intricacy to generate these demonstrations. In this paper, we raise the fundamental question that whether human-generated demonstrations are necessary for ICL. To answer this question, we propose self-contemplation prompting strategy (SEC), a paradigm free from human-crafted demonstrations. The key point of SEC is that, instead of using hand-crafted examples as demonstrations in ICL, SEC asks LLMs to first create demonstrations on their own, based on which the final output is generated. SEC is a flexible framework and can be adapted to both the vanilla ICL and the chain-of-thought (CoT), but with greater ease: as the manual-generation process of both examples and rationale can be saved. Extensive experiments in arithmetic reasoning, commonsense reasoning, multi-task language understanding, and code generation benchmarks, show that SEC, which does not require hand-crafted demonstrations, significantly outperforms the zero-shot learning strategy, and achieves comparable results to ICL with hand-crafted demonstrations. This demonstrates that, for many tasks, contemporary LLMs possess a sufficient level of competence to exclusively depend on their own capacity for decision making, removing the need for external training data.

YNIMG Journal 2024 Journal Article

Attentional control influence habituation through modulation of connectivity patterns within the prefrontal cortex: Insights from stereo-EEG

  • Huimin Huang
  • Rui Li
  • Xiaojun Qiao
  • Xiaoran Li
  • Ziyue Li
  • Siyi Chen
  • Yi Yao
  • Fengpeng Wang

Attentional control, guided by top-down processes, enables selective focus on pertinent information, while habituation, influenced by bottom-up factors and prior experiences, shapes cognitive responses by emphasizing stimulus relevance. These two fundamental processes collaborate to regulate cognitive behavior, with the prefrontal cortex and its subregions playing a pivotal role. Nevertheless, the intricate neural mechanisms underlying the interaction between attentional control and habituation are still a subject of ongoing exploration. To our knowledge, there is a dearth of comprehensive studies on the functional connectivity between subsystems within the prefrontal cortex during attentional control processes in both primates and humans. Utilizing stereo-electroencephalogram (SEEG) recordings during the Stroop task, we observed top-down dominance effects and corresponding connectivity patterns among the orbitofrontal cortex (OFC), the middle frontal gyrus (MFG), and the inferior frontal gyrus (IFG) during heightened attentional control. These findings highlighting the involvement of OFC in habituation through top-down attention. Our study unveils unique connectivity profiles, shedding light on the neural interplay between top-down and bottom-up attentional control processes, shaping goal-directed attention.

EAAI Journal 2024 Journal Article

Automatic segmentation of curtain wall frame using a context collaboration pyramid network

  • Decheng Wu
  • Longqi Cheng
  • Rui Li
  • Pingan Yang
  • Xiaoyu Xu
  • Xiaojie Wang
  • Chul-Hee Lee

Accurate positioning of curtain wall frames is crucial for the automated installation of curtain wall modules. However, the current robot-based installation methods overly depend on visual guidance from operators, resulting in high costs and limiting construction efficiency. The development of deep learning has introduced an image segmentation approach that offers a new solution for the visual positioning of curtain wall frames. This paper proposes a context collaboration pyramid network to automatically segment curtain wall frames by incorporating context interaction and channel guided pyramid structure. The model adopts an “encoder-decoder” architecture with a feature interaction block strategically inserted between the encoder and decoder. Specifically, the encoder utilizes the pyramid pooling Transformer as a backbone to extract multi-level features from original RGB images. The decoder employs a channel guided pyramid convolution module to integrate multi-scale features and achieve finer prediction. Meanwhile, a context interaction fusion module between the features of adjacent levels was designed carefully to enhance the collaboration of the architecture. In addition, a benchmark dataset for the curtain wall frame segmentation task, consisting of 1547 images, was established. The dataset incorporates challenging scenarios, including strong lights, low contrast, and cluttered backgrounds. This method is evaluated on the collected dataset, and achieves an impressive accuracy of 97. 30% and an F1-Score of 88. 95%, outperforming other segmentation networks. Overall, the proposed method can extract target information accurately and efficiently and provide critical visual guidance for the robot, so as to promote the automatic installation level of the curtain wall module.

ICML Conference 2024 Conference Paper

Bayesian Adaptation of Network Depth and Width for Continual Learning

  • Jeevan Thapa
  • Rui Li

While existing dynamic architecture-based continual learning methods adapt network width by growing new branches, they overlook the critical aspect of network depth. We propose a novel non-parametric Bayesian approach to infer network depth and adapt network width while maintaining model performance across tasks. Specifically, we model the growth of network depth with a beta process and apply drop-connect regularization to network width using a conjugate Bernoulli process. Our results show that our proposed method achieves superior or comparable performance with state-of-the-art methods across various continual learning benchmarks. Moreover, our approach can be readily extended to unsupervised continual learning, showcasing competitive performance compared to existing techniques.

AAAI Conference 2024 Conference Paper

CONSIDER: Commonalities and Specialties Driven Multilingual Code Retrieval Framework

  • Rui Li
  • Liyang He
  • Qi Liu
  • Yuze Zhao
  • Zheng Zhang
  • Zhenya Huang
  • Yu Su
  • Shijin Wang

Multilingual code retrieval aims to find code snippets relevant to a user's query from a multilingual codebase, which plays a crucial role in software development and expands their application scenarios compared to classical monolingual code retrieval. Despite the performance improvements achieved by previous studies, two crucial problems are overlooked in the multilingual scenario. First, certain programming languages face data scarcity in specific domains, resulting in limited representation capabilities within those domains. Second, different programming languages can be used interchangeably within the same domain, making it challenging for multilingual models to accurately identify the intended programming language of a user's query. To address these issues, we propose the CommONalities and SpecIalties Driven Multilingual CodE Retrieval Framework (CONSIDER), which includes two modules. The first module enhances the representation of various programming languages by modeling pairwise and global commonalities among them. The second module introduces a novel contrastive learning negative sampling algorithm that leverages language confusion to automatically extract specific language features. Through our experiments, we confirm the significant benefits of our model in real-world multilingual code retrieval scenarios in various aspects. Furthermore, an evaluation demonstrates the effectiveness of our proposed CONSIDER framework in monolingual scenarios as well. Our source code is available at https://github.com/smsquirrel/consider.

IROS Conference 2024 Conference Paper

Design and Validation of Flexible Aerial Robotics for Safe Human-Robot Interaction

  • Fuhua Jia
  • Zihao Zheng
  • Cheng'ao Li
  • Junlin Xiao
  • Rui Li
  • Xiaoying Yang
  • Adam Rushworth
  • Salman Ijaz 0002

This work addresses the critical challenge of integrating drones into human-aerial robot interaction by presenting a novel Soft Flexible Aerial Robotics (SFAR) design. SFAR features an innovative low-pressure inflatable airbag structure that replaces traditional rigid frames, enhancing safety by mitigating collision risks with humans and payloads. To control this unconventional aerial platform, we present a control strategy based on a virtual link dynamics model that exploits the drone’s unique design. Our contributions include the pioneering design of an aerial robot specifically for Human-Aerial Robot Interaction (HARI), a novel control framework that balances flight performance with passive safety, and the validation of SFAR through real-world experiments, demonstrating its ability to perform at par with traditional rigid-body drones while offering enhanced safety features for seamless and safe integration into human environments.

EAAI Journal 2024 Journal Article

Enhancing OCR with line segmentation mask for container text recognition in container terminal

  • Zhichao Zhang
  • Yi Ding
  • Rui Li
  • Kaimin Chen

Optical Character Recognition (OCR) plays a pivotal role in enhancing the operational efficiency of container ports. However, challenges such as angle limitations and the complexity of container fonts in traditional OCR systems lead to tilted text and text adhesion, thereby reducing the recognition rate. Recognizing containers at a high speed is equally crucial for port operations. In this study, we address these challenges by introducing an Enhanced OCR (EOCR) system, incorporating Line Segmentation Mask (LSM)-based detection and Scanline-based recognition. LSM tackles the issue of text adhesion caused by traditional segmentation, while recognition based on scan lines accelerates efficiency. Additionally, we propose the arbitrary angle quadrilateral fitting algorithm targeting sloping quad areas in images taken at a container terminal. Experimental results on a dataset of container images from the Shanghai Port demonstrate superior performance compared to existing algorithms, achieving a recognition accuracy rate of up to 98. 7%. Furthermore, an ablation study confirms that our EOCR significantly enhances recognition accuracy while ensuring real-time performance.

AAAI Conference 2024 Conference Paper

Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier Domain

  • Xuanhua He
  • Tao Hu
  • Guoli Wang
  • Zejin Wang
  • Run Wang
  • Qian Zhang
  • Keyu Yan
  • Ziyi Chen

RAW to sRGB mapping, which aims to convert RAW images from smartphones into RGB form equivalent to that of Digital Single-Lens Reflex (DSLR) cameras, has become an important area of research. However, current methods often ignore the difference between cell phone RAW images and DSLR camera RGB images, a difference that goes beyond the color matrix and extends to spatial structure due to resolution variations. Recent methods directly rebuild color mapping and spatial structure via shared deep representation, limiting optimal performance. Inspired by Image Signal Processing (ISP) pipeline, which distinguishes image restoration and enhancement, we present a novel Neural ISP framework, named FourierISP. This approach breaks the image down into style and structure within the frequency domain, allowing for independent optimization. FourierISP is comprised of three subnetworks: Phase Enhance Subnet for structural refinement, Amplitude Refine Subnet for color learning, and Color Adaptation Subnet for blending them in a smooth manner. This approach sharpens both color and structure, and extensive evaluations across varied datasets confirm that our approach realizes state-of-the-art results. Code will be available at https://github.com/alexhe101/FourierISP.

EAAI Journal 2024 Journal Article

Evolutionary computation and reinforcement learning integrated algorithm for distributed heterogeneous flowshop scheduling

  • Rui Li
  • Ling Wang
  • Wenyin Gong
  • Jingfang Chen
  • Zixiao Pan
  • Yuting Wu
  • Yang Yu

With the advancement of the global economy, there is a growing focus on distributed manufacturing. This study addresses the complex challenges posed by the distributed heterogeneous flow shop scheduling problem (DHFSP), wherein multiple machine processing speeds are taken into account. The primary objectives involve the simultaneous minimization of both makespan and total energy consumption. To tackle this intricate problem, we propose an evolutionary computation and reinforcement learning integrated algorithm (ECRLIA) approach. Initially, an optimization framework is meticulously crafted to synergistically integrate both evolutionary computation and reinforcement learning solvers. Subsequently, a multi-rule cooperation initialization is devised to expedite the pre-search process across all solvers. Following this, a competition-based cooperative evolutionary algorithm is introduced to conduct a global search, thereby providing an initial solution to the DHFSP. The interplay of competition and cooperation among individuals enhances convergence. Further, a Q-learning approach employing dual agents is designed to perform a local search, supplementing solutions that evolutionary algorithms may struggle to uncover. This learning method incorporates an auxiliary agent to evaluate the action predictions of the primary agent, ensuring more stable learning. The effectiveness of the proposed algorithm is assessed through numerical experiments, which validate the efficacy of the cooperation framework, initialization cooperation, and the enhanced Q-learning method. Furthermore, ECRLIA is benchmarked against five state-of-the-art algorithms for DHFSP, and the results affirm the significant superiority of the proposed ECRLIA in addressing DHFSP compared to other algorithms.

AAAI Conference 2024 Conference Paper

Frequency-Adaptive Pan-Sharpening with Mixture of Experts

  • Xuanhua He
  • Keyu Yan
  • Rui Li
  • Chengjun Xie
  • Jie Zhang
  • Man Zhou

Pan-sharpening involves reconstructing missing high-frequency information in multi-spectral images with low spatial resolution, using a higher-resolution panchromatic image as guidance. Although the inborn connection with frequency domain, existing pan-sharpening research has not almost investigated the potential solution upon frequency domain. To this end, we propose a novel Frequency Adaptive Mixture of Experts (FAME) learning framework for pan-sharpening, which consists of three key components: the Adaptive Frequency Separation Prediction Module, the Sub-Frequency Learning Expert Module, and the Expert Mixture Module. In detail, the first leverages the discrete cosine transform to perform frequency separation by predicting the frequency mask. On the basis of generated mask, the second with low-frequency MOE and high-frequency MOE takes account for enabling the effective low-frequency and high-frequency information reconstruction. Followed by, the final fusion module dynamically weights high frequency and low-frequency MOE knowledge to adapt to remote sensing images with significant content variations. Quantitative and qualitative experiments over multiple datasets demonstrate that our method performs the best against other state-of-the-art ones and comprises a strong generalization ability for real-world scenes. Code will be made publicly at https://github.com/alexhe101/FAME-Net.

TMLR Journal 2024 Journal Article

On the Interdependence between Data Selection and Architecture Optimization in Deep Active Learning

  • Pradeep Bajracharya
  • Rui Li
  • Linwei Wang

Deep active learning (DAL) studies the optimal selection of labeled data for training deep neural networks (DNNs). While data selection in traditional active learning is mostly optimized for given features, in DNN these features are learned and change with the learning process as well as the choices of DNN architectures. How is the optimal selection of data affected by this change is not well understood in DAL. To shed light on this question, we present the first systematic investigation on: 1) the relative performance of representative modern DAL data selection strategies, as the architecture types and sizes change in the underlying DNN architecture (Focus 1), and 2) the effect of optimizing the DNN architecture of a DNN on DAL (Focus 2). The results suggest that the change in the DNN architecture significantly influences and outweighs the benefits of data selection in DAL. These results cautions the community in generalizing DAL findings obtained on specific architectures, while suggesting the importance to optimize the DNN architecture in order to maximize the effect of active data selection in DAL.

NeurIPS Conference 2024 Conference Paper

Optimal Transport-based Labor-free Text Prompt Modeling for Sketch Re-identification

  • Rui Li
  • Tingting Ren
  • Jie Wen
  • Jinxing Li

Sketch Re-identification (Sketch Re-ID), which aims to retrieve target person from an image gallery based on a sketch query, is crucial for criminal investigation, law enforcement, and missing person searches. Existing methods aim to alleviate the modality gap by employing semantic metrics constraints or auxiliary modal guidance. However, they incur expensive labor costs and inevitably omit fine-grained modality-consistent information due to the abstraction of sketches. To address this issue, this paper proposes a novel $\textit{Optimal Transport-based Labor-free Text Prompt Modeling}$ (OLTM) network, which hierarchically extracts coarse- and fine-grained similarity representations guided by textual semantic information without any additional annotations. Specifically, multiple target attributes are flexibly obtained by a pre-trained visual question answering (VQA) model. Subsequently, a text prompt reasoning module employs learnable prompt strategy and optimal transport algorithm to extract discriminative global and local text representations, which serve as a bridge for hierarchical and multi-granularity modal alignment between sketch and image modalities. Additionally, instead of measuring the similarity of two samples by only computing their distance, a novel triplet assignment loss is further proposed, in which the whole data distribution also contributes to optimizing the inter/intra-class distances. Extensive experiments conducted on two public benchmarks consistently demonstrate the robustness and superiority of our OLTM over state-of-the-art methods.

IROS Conference 2024 Conference Paper

PS-Loc: Robust LiDAR Localization with Prior Structural Reference

  • Rui Li
  • Wentao Zhao
  • Tianchen Deng
  • Yanbo Wang
  • Jingchuan Wang

Prior structural reference like floor plan is readily accessible in indoor scene, which exhibits the potential of improving localization quality without the requirements of a previously-built high-precision map. This paper introduces a novel optimal transport-based framework for prior structural reference-based localization, aiming to improve the robustness for the robot localization. Leveraging the spacial relations of structures, a matching method based on optimal transport theory is proposed and it improves the robustness of matching results in dynamic scene and rapid rotation conditions. Additionally, this paper handles metric inaccuracies in the known structural reference by implementing an prior guided plane adjustment-based updating strategy. This strategy combines prior and observational information to jointly optimize the structural information within a sliding window. The performance of the framework is validated through real-world experiments, demonstrating superior accuracy and robustness to disturbances from dynamic occlusion and rapid rotation compared to common state-of-the-art SLAM and localization methods.

NeurIPS Conference 2024 Conference Paper

Reflective Multi-Agent Collaboration based on Large Language Models

  • Xiaohe Bo
  • Zeyu Zhang
  • Quanyu Dai
  • Xueyang Feng
  • Lei Wang
  • Rui Li
  • Xu Chen
  • Ji-Rong Wen

Benefiting from the powerful language expression and planning capabilities of Large Language Models (LLMs), LLM-based autonomous agents have achieved promising performance in various downstream tasks. Recently, based on the development of single-agent systems, researchers propose to construct LLM-based multi-agent systems to tackle more complicated tasks. In this paper, we propose a novel framework, named COPPER, to enhance the collaborative capabilities of LLM-based agents with the self-reflection mechanism. To improve the quality of reflections, we propose to fine-tune a shared reflector, which automatically tunes the prompts of actor models using our counterfactual PPO mechanism. On the one hand, we propose counterfactual rewards to assess the contribution of a single agent’s reflection within the system, alleviating the credit assignment problem. On the other hand, we propose to train a shared reflector, which enables the reflector to generate personalized reflections according to agent roles, while reducing the computational resource requirements and improving training stability. We conduct experiments on three datasets to evaluate the performance of our model in multi-hop question answering, mathematics, and chess scenarios. Experimental results show that COPPER possesses stronger reflection capabilities and exhibits excellent generalization performance across different actor models.

NeurIPS Conference 2023 Conference Paper

AdaVAE: Bayesian Structural Adaptation for Variational Autoencoders

  • Paribesh Regmi
  • Rui Li

The neural network structures of generative models and their corresponding inference models paired in variational autoencoders (VAEs) play a critical role in the models' generative performance. However, powerful VAE network structures are hand-crafted and fixed prior to training, resulting in a one-size-fits-all approach that requires heavy computation to tune for given data. Moreover, existing VAE regularization methods largely overlook the importance of network structures and fail to prevent overfitting in deep VAE models with cascades of hidden layers. To address these issues, we propose a Bayesian inference framework that automatically adapts VAE network structures to data and prevent overfitting as they grow deeper. We model the number of hidden layers with a beta process to infer the most plausible encoding/decoding network depths warranted by data and perform layer-wise dropout regularization with a conjugate Bernoulli process. We develop a scalable estimator that performs joint inference on both VAE network structures and latent variables. Our experiments show that the inference framework effectively prevents overfitting in both shallow and deep VAE models, yielding state-of-the-art performance. We demonstrate that our framework is compatible with different types of VAE backbone networks and can be applied to various VAE variants, further improving their performance.

ICML Conference 2023 Conference Paper

Improving Hyperparameter Learning under Approximate Inference in Gaussian Process Models

  • Rui Li
  • S. T. John
  • Arno Solin

Approximate inference in Gaussian process (GP) models with non-conjugate likelihoods gets entangled with the learning of the model hyperparameters. We improve hyperparameter learning in GP models and focus on the interplay between variational inference (VI) and the learning target. While VI’s lower bound to the marginal likelihood is a suitable objective for inferring the approximate posterior, we show that a direct approximation of the marginal likelihood as in Expectation Propagation (EP) is a better learning objective for hyperparameter optimization. We design a hybrid training procedure to bring the best of both worlds: it leverages conjugate-computation VI for inference and uses an EP-like marginal likelihood approximation for hyperparameter learning. We compare VI, EP, Laplace approximation, and our proposed training procedure and empirically demonstrate the effectiveness of our proposal across a wide range of data sets.

EAAI Journal 2023 Journal Article

Problem-specific knowledge MOEA/D for energy-efficient scheduling of distributed permutation flow shop in heterogeneous factories

  • Cong Luo
  • Wenyin Gong
  • Rui Li
  • Chao Lu

With the development of the global economy and the enhancement of environmental awareness, energy-efficient permutation flow shop scheduling gets more attention. Nevertheless, research on distributed scheduling with heterogeneous factories is scarce. In this paper, a knowledge-driven MOEA/D (KMOEA/D) is proposed to address the energy-efficient scheduling of distributed permutation flow shop problem in heterogeneous factories (DPFSP-HF) with the criteria of minimizing the makespan ( C m a x ) and total energy consumption ( T E C ). First, an efficient energy-saving strategy is proposed to reduce the T E C criteria. Second, a constructive heuristic is designed to generate a high-quality solution set. Third, an ingenious genetic operator is utilized to maintain population diversity. Fourth, the knowledge-driven local search operator combined the problem-specific knowledge is constructed according to the properties of DPFSP-HF. Additionally, the Taguchi approach is used to calibrate the parameter configuration of KMOEA/D. We evaluate the effectiveness of each improvement of KMOEA/D and compare it to other well-known multi-objective optimization algorithms on different instances. The results indicate the effectiveness of each improvement of KMOEA/D, and verify that KMOEA/D is an efficient approach to address DPFSP-HF.

JBHI Journal 2022 Journal Article

A Photogrammetric Method for the Measurement of Three-Dimensional Cervical Range of Motion

  • Rui Li
  • Qi Jiang

Cervical spondylosis has gradually become a high-incidence disease in today’s society. The cervical range of motion (CROM) is widely used as the evaluation criterion of cervical status, whereas the existing methods of CROM measuring are not humanized enough. Consequently, the purpose of this study was to develop a novel photogrammetric method to assess three-dimensional CROM. Three smartphone cameras were controlled to simultaneously capture three-direction photographs of the subject wearing the special designed device with three mark lines. The obtained photographs were uploaded to a PC and the mark lines in each photograph were extracted by utilizing both the Radon transform and the Hough transform. By calculating and combining the tilt angles of three mark lines, the CROM of the subject was indirectly determined. The performance of our method was compared with the goniometer-based method: the inter-instrument reliability was excellent for all six cervical movements with intraclass correlation coefficients $>$ 0. 99; the degree of agreement between the two methods was high with Pearson’s coefficients $>$ 0. 98; and the Bland-Altman plots also revealed the validity of our method. Moreover, the concept of a cervical motion curve was put forward to describe the movement track of the neck in order to reflect the cervical health status. The proposed approach is feasible, automatic and convenient for the measurement of CROM and the generated cervical motion curve can intuitively exhibit the trajectory of the neck. This technique that can easily acquire the biomedical information of the cervical spine has tremendous potential in the diagnosis, healthcare and wellness management of the neck.

AAAI Conference 2022 Conference Paper

Co-promotion Predictions of Financing Market and Sales Market: A Cooperative-Competitive Attention Approach

  • Lei Zhang
  • Wang Xiang
  • Chuang Zhao
  • Hongke Zhao
  • Rui Li
  • Runze Wu

Market popularity prediction has always been a hot research topic, such as sales prediction and crowdfunding prediction. Most of these studies put the perspective on isolated markets, relying on the knowledge of certain market to maximize the prediction performance. However, these market-specific approaches are restricted by the knowledge limitation of isolated markets and incapable of the complicated and potential relations among different markets, especially some with strong dependence such as the financing market and sales market. Fortunately, we discover potentially symbiotic relations between the financing market and the sales market, which provides us with an opportunity to co-promote the popularity predictions of both markets. Thus, for bridgly learning the knowledge interactions between financing market and sales market, we propose a cross-market approach, namely CATN: Cooperative-competitive Attention Transfer Network, which could effectively transfer knowledge of financing capability from the crowdfunding market and sales prospect from the E-commerce market. Specifically, for capturing the complicated relations especially the cooperation or complement of items and enhancing the knowledge transfer between the two heterogeneous markets, we design a novel Cooperative Attention; meanwhile, for finely computing the relations of items especially the competition in specific same market, we further design Competitive Attentions for the two markets respectively. Besides, we also distinguish aligned features and unique features to adapt the cross-market predictions. With the real-world datasets collected from Indiegogo and Amazon, we construct extensive experiments on three types of datasets from the two markets and the results demonstrate the effectiveness and generalization of our CATN model.

ICRA Conference 2022 Conference Paper

Dilated Continuous Random Field for Semantic Segmentation

  • Xi Mo
  • Xiangyu Chen 0008
  • Cuncong Zhong
  • Rui Li
  • Kaidong Li
  • Usman Sajid

Mean field approximation methodology has laid the foundation of modern Continuous Random Field (CRF) based solutions for the refinement of semantic segmentation. In this paper, we propose to relax the hard constraint of mean field approximation - minimizing the energy term of each node from probabilistic graphical model, by a global optimization with the proposed dilated sparse convolution module (DSConv). In addition, adaptive global average-pooling and adaptive global max-pooling are implemented as replacements of fully connected layers. In order to integrate DSConv, we design an end-to-end, time-efficient DilatedCRF pipeline. The unary energy term is derived either from pre-softmax and post-softmax features, or the predicted affordance map using a conventional classifier, making it easier to implement DilatedCRF for varieties of classifiers. We also present superior experimental results of proposed approach on the suction dataset comparing to other CRF-based approaches.

AAAI Conference 2022 Conference Paper

Heterogeneity-Aware Twitter Bot Detection with Relational Graph Transformers

  • Shangbin Feng
  • Zhaoxuan Tan
  • Rui Li
  • Minnan Luo

Twitter bot detection has become an important and challenging task to combat misinformation and protect the integrity of the online discourse. State-of-the-art approaches generally leverage the topological structure of the Twittersphere, while they neglect the heterogeneity of relations and influence among users. In this paper, we propose a novel bot detection framework to alleviate this problem, which leverages the topological structure of user-formed heterogeneous graphs and models varying influence intensity between users. Specifically, we construct a heterogeneous information network with users as nodes and diversified relations as edges. We then propose relational graph transformers to model heterogeneous influence between users and learn node representations. Finally, we use semantic attention networks to aggregate messages across users and relations and conduct heterogeneity-aware Twitter bot detection. Extensive experiments demonstrate that our proposal outperforms state-ofthe-art methods on a comprehensive Twitter bot detection benchmark. Additional studies also bear out the effectiveness of our proposed relational graph transformers, semantic attention networks and the graph-based approach in general.

JBHI Journal 2022 Journal Article

Undersampled Multi-Contrast MRI Reconstruction Based on Double-Domain Generative Adversarial Network

  • Haining Wei
  • Zhongsen Li
  • Shuai Wang
  • Rui Li

Multi-contrast magnetic resonance imaging can provide comprehensive information for clinical diagnosis. However, multi-contrast imaging suffers from long acquisition time, which makes it inhibitive for daily clinical practice. Subsampling k-space is one of the main methods to speed up scan time. Missing k-space samples will lead to inevitable serious artifacts and noise. Considering the assumption that different contrast modalities share some mutual information, it may be possible to exploit this redundancy to accelerate multi-contrast imaging acquisition. Recently, generative adversarial network shows superior performance in image reconstruction and synthesis. Some studies based on k-space reconstruction also exhibit superior performance over conventional state-of-art method. In this study, we propose a cross-domain two-stage generative adversarial network for multi-contrast images reconstruction based on prior full-sampled contrast and undersampled information. The new approach integrates reconstruction and synthesis, which estimates and completes the missing k-space and then refines in image space. It takes one fully-sampled contrast modality data and highly undersampled data from several other modalities as input, and outputs high quality images for each contrast simultaneously. The network is trained and tested on a public brain dataset from healthy subjects. Quantitative comparisons against baseline clearly indicate that the proposed method can effectively reconstruct undersampled images. Even under high acceleration, the network still can recover texture details and reduce artifacts.

IJCAI Conference 2022 Conference Paper

Unsupervised Multi-Modal Medical Image Registration via Discriminator-Free Image-to-Image Translation

  • Zekang Chen
  • Jia wei
  • Rui Li

In clinical practice, well-aligned multi-modal images, such as Magnetic Resonance (MR) and Computed Tomography (CT), together can provide complementary information for image-guided therapies. Multi-modal image registration is essential for the accurate alignment of these multi-modal images. However, it remains a very challenging task due to complicated and unknown spatial correspondence between different modalities. In this paper, we propose a novel translation-based unsupervised deformable image registration approach to convert the multi-modal registration problem to a mono-modal one. Specifically, our approach incorporates a discriminator-free translation network to facilitate the training of the registration network and a patchwise contrastive loss to encourage the translation network to preserve object shapes. Furthermore, we propose to replace an adversarial loss, that is widely used in previous multi-modal image registration methods, with a pixel loss in order to integrate the output of translation into the target modality. This leads to an unsupervised method requiring no ground-truth deformation or pairs of aligned images for training. We evaluate four variants of our approach on the public Learn2Reg 2021 datasets. The experimental results demonstrate that the proposed architecture achieves state-of-the-art performance. Our code is available at https: //github. com/heyblackC/DFMIR.

TIST Journal 2021 Journal Article

A Camera Identity-guided Distribution Consistency Method for Unsupervised Multi-target Domain Person Re-identification

  • Jiajie Tian
  • Qihao Tang
  • Rui Li
  • Zhu Teng
  • Baopeng Zhang
  • Jianping Fan

Unsupervised domain adaptation (UDA) for person re-identification (re-ID) is a challenging task due to large variations in human classes, illuminations, camera views, and so on. Currently, existing UDA methods focus on two-domain adaptation and are generally trained on one labeled source set and adapted on the other unlabeled target set. In this article, we put forward a new issue on person re-ID, namely, unsupervised multi-target domain adaptation (UMDA). It involves one labeled source set and multiple unlabeled target sets, which is more reasonable for practical real-world applications. Enabling UMDA has to learn the consistency for multiple domains, which is significantly different from the UDA problem. To ensure distribution consistency and learn the discriminative embedding, we further propose the Camera Identity-guided Distribution Consistency method that performs an alignment operation for multiple domains. The camera identities are encoded into the image semantic information to facilitate the adaptation of features. According to our knowledge, this is the first attempt on the unsupervised multi-target domain adaptation learning. Extensive experiments are executed on Market-1501, DukeMTMC-reID, MSMT17, PersonX, and CUHK03, and our method has achieved very competitive re-ID accuracy in multi-target domains against numerous state-of-the-art methods.

NeurIPS Conference 2021 Conference Paper

A Channel Coding Benchmark for Meta-Learning

  • Rui Li
  • Ondrej Bohdal
  • Rajesh K Mishra
  • Hyeji Kim
  • Da Li
  • Nicholas Lane
  • Timothy Hospedales

Meta-learning provides a popular and effective family of methods for data-efficient learning of new tasks. However, several important issues in meta-learning have proven hard to study thus far. For example, performance degrades in real-world settings where meta-learners must learn from a wide and potentially multi-modal distribution of training tasks; and when distribution shift exists between meta-train and meta-test task distributions. These issues are typically hard to study since the shape of task distributions, and shift between them are not straightforward to measure or control in standard benchmarks. We propose the channel coding problem as a benchmark for meta-learning. Channel coding is an important practical application where task distributions naturally arise, and fast adaptation to new tasks is practically valuable. We use this benchmark to study several aspects of meta-learning, including the impact of task distribution breadth and shift on meta-learner performance, which can be controlled in the coding problem. Going forward, this benchmark provides a tool for the community to study the capabilities and limitations of meta-learning, and to drive research on practically robust and effective meta-learners.

AAAI Conference 2021 Conference Paper

A Continual Learning Framework for Uncertainty-Aware Interactive Image Segmentation

  • Ervine Zheng
  • Qi Yu
  • Rui Li
  • Pengcheng Shi
  • Anne Haake

Deep learning models have achieved state-of-the-art performance in semantic image segmentation, but the results provided by fully automatic algorithms are not always guaranteed satisfactory to users. Interactive segmentation offers a solution by accepting user annotations on selective areas of the images to refine the segmentation results. However, most existing models only focus on correcting the current image’s misclassified pixels, with no knowledge carried over to other images. In this work, we formulate interactive image segmentation as a continual learning problem and propose a framework to effectively learn from user annotations, aiming to improve the segmentation on both the current image and unseen images in future tasks while avoiding deteriorated performance on previously-seen images. It employs a probabilistic mask to control the neural network’s kernel activation and extract the most suitable features for segmenting images in each task. We also apply a task-aware embedding to automatically infer the optimal kernel activation for initial segmentation and subsequent refinement. Interactions with users are guided through multi-source uncertainty estimation so that users can focus on the most important areas to minimize the overall manual annotation effort. Experiments are performed on both medical and natural image datasets to illustrate the proposed framework’s effectiveness on basic segmentation performance, forward knowledge transfer, and backward knowledge transfer.

JBHI Journal 2021 Journal Article

Altered Time-Frequency Feature in Default Mode Network of Autism Based on Improved Hilbert-Huang Transform

  • Han Zhang
  • Rui Li
  • Xiaotong Wen
  • Qing Li
  • Xia Wu

Autism spectrum disorder (ASD) is a pervasive neurodevelopmental disorder characterized by restricted interests and repetitive behaviors. Non-invasive measurements of brain activity with functional magnetic resonance imaging (fMRI) have demonstrated that the abnormality in the default mode network (DMN) is a crucial neural basis of ASD, but the time-frequency feature of the DMN has not yet been revealed. Hilbert-Huang transform (HHT) is conducive to feature extraction of biomedical signals and has recently been suggested as an effective way to explore the time-frequency feature of the brain mechanism. In this study, the resting-state fMRI dataset of 105 subjects including 59 ASD participants and 46 healthy control (HC) participants were involved in the time-frequency clustering analysis based on improved HHT and modified k-means clustering with label-replacement. Compared with HC, ASD selectively showed enhanced Hilbert weight frequency (HWF) in high frequency bands in crucial regions of the DMN, including the medial prefrontal cortex (MPFC), posterior cingulate cortex (PCC) and anterior cingulate cortex (ACC). Time-frequency clustering analysis revealed altered DMN organization in ASD. In the posterior DMN, the PCC and bilateral precuneus were separated for HC but clustered for ASD; in the anterior DMN, the clusters of ACC, dorsal MPFC, and ventral MPFC were relatively scattered for ASD. This study paves a promising way to uncover the alteration in the DMN and identifies a potential neuroimaging biomarker of diagnostic reference for ASD.

JBHI Journal 2021 Journal Article

Evaluating Technology-Mediated Collaborative Workflows for Telehealth

  • Christopher Bondy
  • Linlin Chen
  • Pamela Grover
  • Vicki Hanson
  • Rui Li
  • Pengcheng Shi

Goals: This paper discusses the need for a predictable method to evaluate gains and gaps of collaborative technology-mediated workflows and introduces an evaluation framework to address this need. Methods: The Collaborative Space – Analysis Framework (CS-AF), introduced in this research, is a cross-disciplinary evaluation method designed to evaluate technology-mediated collaborative workflows. The 5-step CS-AF meta-process includes: (1) current-state workflow definition, (2) current-state (baseline) workflow assessment, (3) technology-mediated workflow development and deployment, (4) technology-mediated workflow assessment, (5) analysis and conclusions. For this research, a comprehensive, empirical study of hypertension exam workflow for telehealth was conducted using the CS-AF approach. Results: The CS-AF systemized approach reveals critical cross-disciplinary evaluation data concerning gains and gaps of collaborative workflows when technology-mediated enhancements are characterized and compared with a baseline workflow for the goal of continuous workflow improvement. Conclusion: The CS-AF is an effective meta-analysis process that can be adapted for use in multiple domains.

NeurIPS Conference 2021 Conference Paper

Joint Inference for Neural Network Depth and Dropout Regularization

  • Kishan K C
  • Rui Li
  • MohammadMahdi Gilany

Dropout regularization methods prune a neural network's pre-determined backbone structure to avoid overfitting. However, a deep model still tends to be poorly calibrated with high confidence on incorrect predictions. We propose a unified Bayesian model selection method to jointly infer the most plausible network depth warranted by data, and perform dropout regularization simultaneously. In particular, to infer network depth we define a beta process over the number of hidden layers which allows it to go to infinity. Layer-wise activation probabilities induced by the beta process modulate neuron activation via binary vectors of a conjugate Bernoulli process. Experiments across domains show that by adapting network depth and dropout regularization to data, our method achieves superior performance comparing to state-of-the-art methods with well-calibrated uncertainty estimates. In continual learning, our method enables neural networks to dynamically evolve their depths to accommodate incrementally available data beyond their initial structures, and alleviate catastrophic forgetting.

JBHI Journal 2021 Journal Article

Learn Fine-Grained Adaptive Loss for Multiple Anatomical Landmark Detection in Medical Images

  • Guang-Quan Zhou
  • Juzheng Miao
  • Xin Yang
  • Rui Li
  • En-Ze Huo
  • Wenlong Shi
  • Yuhao Huang
  • Jikuan Qian

Automatic and accurate detection of anatomical landmarks is an essential operation in medical image analysis with a multitude of applications. Recent deep learning methods have improved results by directly encoding the appearance of the captured anatomy with the likelihood maps (i. e. , heatmaps). However, most current solutions overlook another essence of heatmap regression, the objective metric for regressing target heatmaps and rely on hand-crafted heuristics to set the target precision, thus being usually cumbersome and task-specific. In this paper, we propose a novel learning-to-learn framework for landmark detection to optimize the neural network and the target precision simultaneously. The pivot of this work is to leverage the reinforcement learning (RL) framework to search objective metrics for regressing multiple heatmaps dynamically during the training process, thus avoiding setting problem-specific target precision. We also introduce an early-stop strategy for active termination of the RL agent's interaction that adapts the optimal precision for separate targets considering exploration-exploitation tradeoffs. This approach shows better stability in training and improved localization accuracy in inference. Extensive experimental results on two different applications of landmark localization: 1) our in-house prenatal ultrasound (US) dataset and 2) the publicly available dataset of cephalometric X-Ray landmark detection, demonstrate the effectiveness of our proposed method. Our proposed framework is general and shows the potential to improve the efficiency of anatomical landmark detection.

IJCAI Conference 2020 Conference Paper

Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

  • Huanle Xu
  • Yang Liu
  • Wing Cheong Lau
  • Rui Li

The problem of multi-armed bandit (MAB) with fairness constraint has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed round of pulls, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing efficient online selection solutions, however, they fail to achieve a sublinear regret bound when incorporating such fairness constraints. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we adopt a new approach that combines online convex optimization with bandit methods to design selection algorithms. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound with probability guarantees. Finally, we evaluate the performance of our algorithm via extensive simulations and demonstrate that it outperforms the baselines substantially.

NeurIPS Conference 2020 Conference Paper

Dynamic Fusion of Eye Movement Data and Verbal Narrations in Knowledge-rich Domains

  • Ervine Zheng
  • Qi Yu
  • Rui Li
  • Pengcheng Shi
  • Anne Haake

We propose to jointly analyze experts' eye movements and verbal narrations to discover important and interpretable knowledge patterns to better understand their decision-making processes. The discovered patterns can further enhance data-driven statistical models by fusing experts' domain knowledge to support complex human-machine collaborative decision-making. Our key contribution is a novel dynamic Bayesian nonparametric model that assigns latent knowledge patterns into key phases involved in complex decision-making. Each phase is characterized by a unique distribution of word topics discovered from verbal narrations and their dynamic interactions with eye movement patterns, indicating experts' special perceptual behavior within a given decision-making stage. A new split-merge-switch sampler is developed to efficiently explore the posterior state space with an improved mixing rate. Case studies on diagnostic error prediction and disease morphology categorization help demonstrate the effectiveness of the proposed model and discovered knowledge patterns.

AAAI Conference 2019 Conference Paper

Improving Domain-Specific Classification by Collaborative Learning with Adaptation Networks

  • Si Wu
  • Jian Zhong
  • Wenming Cao
  • Rui Li
  • Zhiwen Yu
  • Hau-San Wong

For unsupervised domain adaptation, the process of learning domain-invariant representations could be dominated by the labeled source data, such that the specific characteristics of the target domain may be ignored. In order to improve the performance in inferring target labels, we propose a targetspecific network which is capable of learning collaboratively with a domain adaptation network, instead of directly minimizing domain discrepancy. A clustering regularization is also utilized to improve the generalization capability of the target-specific network by forcing target data points to be close to accumulated class centers. As this network learns and specializes to the target domain, its performance in inferring target labels improves, which in turn facilitates the learning process of the adaptation network. Therefore, there is a mutually beneficial relationship between these two networks. We perform extensive experiments on multiple digit and object datasets, and the effectiveness and superiority of the proposed approach is presented and verified on multiple visual adaptation benchmarks, e. g. , we improve the state-ofthe-art on the task of MNIST→SVHN from 76. 5% to 84. 9% without specific augmentation.

NeurIPS Conference 2019 Conference Paper

Multivariate Sparse Coding of Nonstationary Covariances with Gaussian Processes

  • Rui Li

This paper studies statistical characteristics of multivariate observations with irregular changes in their covariance structures across input space. We propose a unified nonstationary modeling framework to jointly encode the observation correlations to generate a piece-wise representation with a hyper-level Gaussian process (GP) governing the overall contour of the pieces. In particular, we couple the encoding process with automatic relevance determination (ARD) to promote sparsity to account for the inherent redundancy. The hyper GP enables us to share statistical strength among the observation variables over a collection of GPs defined within the observation pieces to characterize the variables' respective local smoothness. Experiments conducted across domains show superior performances over the state-of-the-art methods.

AAAI Conference 2018 Conference Paper

Understanding Social Interpersonal Interaction via Synchronization Templates of Facial Events

  • Rui Li
  • Jared Curhan
  • Mohammed Hoque

Automatic facial expression analysis in inter-personal communication is challenging. Not only because conversation partners’ facial expressions mutually influence each other, but also because no correct interpretation of facial expressions is possible without taking social context into account. In this paper, we propose a probabilistic framework to model interactional synchronization between conversation partners based on their facial expressions. Interactional synchronization manifests temporal dynamics of conversation partners’ mutual influence. In particular, the model allows us to discover a set of common and unique facial synchronization templates directly from natural interpersonal interaction without recourse to any predefined labeling schemes. The facial synchronization templates represent periodical facial event coordinations shared by multiple conversation pairs in a specific social context. We test our model on two different dyadic conversations of negotiation and job-interview. Based on the discovered facial event coordination, we are able to predict their conversation outcomes with higher accuracy than HMMs and GMMs.

IJCAI Conference 2017 Conference Paper

Modeling Physicians' Utterances to Explore Diagnostic Decision-making

  • Xuan Guo
  • Rui Li
  • Qi Yu
  • Anne Haake

Diagnostic error prevention is a long-established but specialized topic in clinical and psychological research. In this paper, we contribute to the field by exploring diagnostic decision-making via modeling physicians' utterances of medical concepts during image-based diagnoses. We conduct experiments to collect verbal narratives from dermatologists while they are examining and describing dermatology images towards diagnoses. We propose a hierarchical probabilistic framework to learn domain-specific patterns from the medical concepts in these narratives. The discovered patterns match the diagnostic units of thought identified by domain experts. These meaningful patterns uncover physicians' diagnostic decision-making processes while parsing the image content. Our evaluation shows that these patterns provide key information to classify narratives by diagnostic correctness levels.

ECAI Conference 2014 Conference Paper

Constrained Latent Dirichlet Allocation for Subgroup Discovery with Topic Rules

  • Rui Li
  • Zahra Ahmadi
  • Stefan Kramer 0001

Subgroup discovery is the task of identifying subgroups that show the most unusual statistical (distributional) characteristics with respect to a given target variable, at the intersection of predictive and descriptive induction. Redundancy and lack of rule interpretability constitute the major challenges in subgroup discovery today. We address these two issues by constrained latent Dirichlet allocation (LDA) to identify co-occurring feature values (descriptions) for subgroup rule search, obtaining a less redundant and more diverse rule set. Latent Dirichlet Allocation, as a topic modeling approach, is able to identify diverse topics, from which the rules can be derived. The resulting rules are less redundant and can also be interpreted by the corresponding topic. Experimental results on six benchmark datasets show that the presented approach provides rule sets with better rule redundancy and diversity compared to those of four existing algorithms. One unique and interesting advantage of the proposed method is that it can categorize rules by topics as well as the assignment of a probability to each feature value of a discovered rule, which can be used in the interpretation of the results.

JBHI Journal 2013 Journal Article

Improved Estimation of the Number of Independent Components for Functional Magnetic Resonance Data by a Whitening Filter

  • Mingqi Hui
  • Rui Li
  • Kewei Chen
  • Zhen Jin
  • Li Yao
  • Zhiying Long

Independent component analysis (ICA) has been widely applied to the analysis of fMRI data. Accurate estimation of the number of independent components (ICs) in fMRI data is critical to reduce over/underfitting. Various methods based on information theoretic criteria (ITC) have been used to estimate the intrinsic dimension of fMRI data. An important assumption of ITC is that the noise is purely white. However, this assumption is often violated by the existence of temporally correlated noise in fMRI data. In this study, we introduced a filtering method into the order selection to remove the autocorrelation from the colored noise by using the whitening filter proposed by Prudon and Weisskoff. Results of the simulated data show that the filtering method has strong robustness to noise and significantly improves the accuracy of order selection from data with colored noise. Moreover, the multifiltering method proposed by us was applied to real fMRI data to improve the performance of ITC. Results of the real fMRI data show that the proposed method can alleviate the overestimation due to the autocorrelation of colored noise. We further compared the stability of IC estimates of real fMRI data at order estimated by minimum description length criterion based on the filtered and unfiltered data by using the software package ICASSO. Results show that ICA yields more stable IC estimates using the reduced order by filtering.

YNIMG Journal 2011 Journal Article

Large-scale directional connections among multi resting-state neural networks in human brain: A functional MRI and Bayesian network modeling study

  • Rui Li
  • Kewei Chen
  • Adam S. Fleisher
  • Eric M. Reiman
  • Li Yao
  • Xia Wu

This study examined the large-scale connectivity among multiple resting-state networks (RSNs) in the human brain. Independent component analysis was first applied to the resting-state functional MRI (fMRI) data acquired from 12 healthy young subjects for the separation of RSNs. Four sensory (lateral and medial visual, auditory, and sensory-motor) RSNs and four cognitive (default-mode, self-referential, dorsal and ventral attention) RSNs were identified. Gaussian Bayesian network (BN) learning approach was then used for the examination of the conditional dependencies among these RSNs and the construction of the network-to-network directional connectivity patterns. The BN based results demonstrated that sensory networks and cognitive networks were hierarchically organized. Specially, we found the sensory networks were highly intra-dependent and the cognitive networks were strongly intra-influenced. In addition, the results depicted dominant bottom-up connectivity from sensory networks to cognitive networks in which the self-referential and the default-mode networks might play respectively important roles in the process of resting-state information transfer and integration. The present study characterized the global connectivity relations among RSNs and delineated more characteristics of spontaneous activity dynamics.

YNIMG Journal 2010 Journal Article

The interrelationship of dopamine D2-like receptor availability in striatal and extrastriatal brain regions in healthy humans: A principal component analysis of [18F]fallypride binding

  • David H. Zald
  • Neil D. Woodward
  • Patrizia Riccardi
  • M. Sib Ansari
  • Ronald M. Baldwin
  • Ronald L. Cowan
  • Clarence E. Smith
  • Helene Hakyemez

Individual differences in dopamine D2-like receptor availability arise across all brain regions expressing D2-like receptors. However, the interrelationships in receptor availability across brain regions are poorly understood. To address this issue, we examined the relationship between D2-like binding potential (BPND) across striatal and extrastriatal regions in a sample of healthy participants. PET imaging was performed with the high affinity D2/D3 ligand [18F]fallypride in 45 participants. BPND images were submitted to voxel-wise principal component analysis to determine the pattern of associations across brain regions. Individual differences in D2-like BPND were explained by three distinguishable components. A single component explained almost all of the variance within the striatum, indicating that individual differences in receptor availability vary in a homogenous manner across the caudate, putamen, and ventral striatum. Cortical BPND was only modestly related to striatal BPND and mostly loaded on a distinct component. After controlling for the general level of cortical D2-like BPND, an inverse relationship emerged between receptor availability in the striatum and the ventral temporal and ventromedial frontal cortices, suggesting possible cross-regulation of D2-like receptors in these regions. The analysis additionally revealed evidence of: (1) a distinct component involving the midbrain and limbic areas; (2) a dissociation between BPND in the medial and lateral temporal regions; and (3) a dissociation between BPND in the medial/midline and lateral thalamus. In summary, individual differences in D2-like receptor availability reflect several distinct patterns. This conclusion has significant implications for neuropsychiatric models that posit global or regionally specific relationships between dopaminergic tone and behavior.

YNIMG Journal 2009 Journal Article

Cerebral morphology and dopamine D2/D3 receptor distribution in humans: A combined [18F]fallypride and voxel-based morphometry study

  • Neil D. Woodward
  • David H. Zald
  • Zhaohua Ding
  • Patrizia Riccardi
  • M. Sib Ansari
  • Ronald M. Baldwin
  • Ronald L. Cowan
  • Rui Li

The relationship between cerebral morphology and the expression of dopamine receptors has not been extensively studied in humans. Elucidation of such relationships may have important methodological implications for clinical studies of dopamine receptor ligand binding differences between control and patient groups. The association between cerebral morphology and dopamine receptor distribution was examined in 45 healthy subjects who completed T1-weighted structural MRI and PET scanning with the D2/D3 ligand [18F]fallypride. Optimized voxel-based morphometry was used to create grey matter volume and density images. Grey matter volume and density images were correlated with binding potential (BPND) images on a voxel-by-voxel basis using the Biological Parametric Mapping toolbox. Associations between cerebral morphology and BPND were also examined for selected regions-of-interest (ROIs) after spatial normalization. Voxel-wise analyses indicated that grey matter volume and density positively correlated with BPND throughout the midbrain, including the substantia nigra. Positive correlations were observed in medial cortical areas, including anterior cingulate and medial prefrontal cortex, and circumscribed regions of the temporal, frontal, and parietal lobes. ROI analyses revealed significant positive correlations between BPND and cerebral morphology in the caudate, thalamus, and amygdala. Few negative correlations between morphology and BPND were observed. Overall, grey matter density appeared more strongly correlated with BPND than grey matter volume. Cerebral morphology, particularly grey matter density, correlates with [18F]fallypride BPND in a regionally specific manner. Clinical studies comparing dopamine receptor availability between clinical and control groups may benefit by accounting for potential differences in cerebral morphology that exist even after spatial normalization.