Arrow Research search

Author name cluster

Xiaoyan Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers
2 author rows

Possible papers

15

EAAI Journal 2026 Journal Article

A lightweight and attention-enhanced framework for robust pavement defect detection

  • Xiaoyan Li
  • Ning Zhang
  • Yue Pan
  • Yaowen Lv
  • Xiping Xu
  • Zheng Wang

Accurately detecting pavement anomalies, a critical task within structural health monitoring (SHM), is essential for infrastructure safety and automated monitoring systems. However, existing deep learning based object detectors, including state-of-the-art You Only Look Once (YOLO) variants, often struggle with defects such as elongated and low-contrast potholes due to irregular geometry and limited spatial context awareness. In this study, we propose an Efficient and CA enhanced You Only Look Once framework (EC-YOLO), an improved deep learning based object detection network designed to address these challenges. The proposed model builds upon the YOLOv11 architecture and introduces two major enhancements: (1) replacing the shallow backbone with EfficientNet-B0 for superior fine-grained feature extraction, and (2) integrating a Coordinate Attention (CA) module into the large-object detection head to capture long-range spatial dependencies. Extensive experiments on the Urban Digital Twins dataset demonstrate that EC-YOLO achieves state-of-the-art performance, attaining 96. 5% mean Average Precision (mAP)@0. 5 and 71. 3% mAP@0. 5: 0. 95. After deployment engine optimization, the model maintains real-time inference at 225. 4 frames per second (FPS) on an NVIDIA Jetson Orin Nano with only 1. 7 giga floating point operations (GFLOPs). Ablation studies further verify the contribution of each component. Moreover, EC-YOLO exhibits strong generalization by outperforming existing models on the Urban Digital Twins for Intelligent Road Inspection (UDTIRI) external benchmark. Overall, deployment verification on the Jetson platform confirms that EC-YOLO is a robust, lightweight, and effective solution for practical road defect inspection in resource-constrained environments.

AAAI Conference 2026 Conference Paper

MARE: Multimodal Analogical Reasoning for Disease Evolution-Aware Radiology Report Generation

  • Qingqing Gao
  • Tengfei Liu
  • Xiaoyan Li
  • Xiaodan Zhang
  • Zhongfan Sun
  • Boyue Wang
  • Baocai Yin
  • Zhaohui Liu

Radiology report generation from longitudinal medical data is critical for assessing disease progression and automating diagnostic workflows. While recent methods incorporate longitudinal information, they primarily rely on multimodal feature fusion, with limited capacity for explicit disease evolution modeling and temporal reasoning. To address this, we propose MARE, an end-to-end framework that formulates longitudinal radiology report generation as a multimodal analogical reasoning task. Inspired by the Abduction–Mapping–Induction paradigm, MARE models latent relational structures underlying disease evolution by aligning lesion-level visual features across time and mapping them to the textual domain for temporally coherent and clinically meaningful report generation. To mitigate the spatial misalignment caused by patient positioning or imaging variation, we introduce an Adaptive Region Alignment (ARA) module for robust temporal correspondence. Additionally, we design Dual Evolution Consistency (DEC) losses to regularize analogical reasoning by enforcing temporal coherence in both visual and textual evolution paths. Extensive experiments on the Longitudinal-MIMIC dataset demonstrate that MARE significantly outperforms state-of-the-art baselines across both natural language generation and clinical effectiveness metrics, highlighting the value of structured analogical reasoning for disease evolution-aware report generation.

YNIMG Journal 2026 Journal Article

Oxygen dependency of cognition: Neural mechanisms of reversible cognitive changes in Tibetan highlanders during altitudinal migration

  • Xiaoyan Li
  • Hao Li
  • Yaping Zeng
  • Dacheng Ren
  • Hailin Ma

The dynamic changes occurring in the brain to adapt to the environment are crucial for human survival. Extensive research has demonstrated that the Tibetan population, indigenous to the plateau, has evolved unique physiological adaptations to hypoxia. However, the neurocognitive basis of these adaptive strategies remains incompletely understood. This study employs a multimodal approach (behavioral testing, event-related potentials, and time-frequency analysis) to systematically examine the effects of long-term high-altitude hypoxic exposure (3680 m) on working memory function in indigenous Tibetans. The aim is to determine whether this impact stems from energy-constrained adaptive functional adjustments or irreversible neurofunctional impairment. Participants included high-altitude native Tibetans, Tibetan migrants residing at plain for 1 and 3 years, and low-altitude Han Chinese controls. Results revealed that spatial working memory remained unaffected in native Tibetans, while verbal working memory accuracy (ACC) showed statistically significant decline. Following relocation to the plains, verbal working memory progressively recovered with increasing duration of residence, with the 3-year group reaching control levels. Neurophysiological data further revealed compensatory increases in late positive potential (LPP) amplitude and beta-band oscillatory power among high-altitude natives, both of which exhibited linear decline with residence duration in individuals relocated to the plains. These findings indicate that high-altitude hypoxia does not cause permanent impairment of verbal working memory function. Instead, it induces selective inhibition of energy-intensive verbal processing systems under energy-constrained conditions. This inhibition is environmentally dependent and reversibly restores upon improved oxygen supply. This study confirms at the cognitive neural mechanism level that functional changes induced by high-altitude hypoxia are fundamentally energy-optimization-driven adaptive reorganization, providing crucial empirical evidence for understanding human brain plasticity under extreme conditions.

JBHI Journal 2025 Journal Article

Dual-Level Imbalance Mitigation for Single-FoV Colorectal Histopathology Image Classification

  • Lingling Yuan
  • Yang Chen
  • Md Rahaman
  • Hongzan Sun
  • Haoyuan Chen
  • Marcin Grzegorzek
  • Chen Li
  • Xiaoyan Li

Single-field-of-view (FoV) histopathological image classification is vital for colorectal cancer (CRC) diagnosis in mid- to low-tier hospitals lacking whole-slide imaging (WSI) scanners and storage, yet suffers from severe class imbalance and degraded performance. To address this, we propose a dual-level imbalance mitigation (DIM) framework integrating data-level and algorithm-level approaches. Specifically: (1) A global context generative adversarial network (GCGAN) generates realistic minority-class images for augmentation to balance the dataset. (2) A frequency-aware adaptive focal loss (FAFL) applies a frequency-aware offset and adaptive modulation to better separate overlapping classes. (3) A lightweight receptive field-based convolutional neural network (LRF-CNN) is trained under DIM to leverage both augmentation and loss modulation for improved classification. Extensive experiments on the single-FoV colorectal histopathology dataset demonstrate that DIM-equipped LRF-CNN outperforms five state-of-the-art models (SOTA) across multiple metrics. Furthermore, each DIM component enhances performance when applied individually to those SOTA models, and additional validation on six single-FoV histopathological datasets confirms the generalizability and effectiveness of the proposed DIM framework. Our code is available at https://github.com/Lingling-Yuan/DIM.

ICRA Conference 2024 Conference Paper

FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

  • Jiawei Hou
  • Xiaoyan Li
  • Wenhao Guan
  • Gang Zhang
  • Di Feng
  • Yuheng Du
  • Xiangyang Xue 0001
  • Jian Pu

In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird’s-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design, aiming to achieve superior performance. However, the inference speed, crucial for running on an autonomous vehicle, is neglected. To this end, a new method, dubbed FastOcc, is proposed. By carefully analyzing the network effect and latency from four parts, including the input image resolution, image backbone, view transformation, and occupancy prediction head, it is found that the occupancy prediction head holds considerable potential for accelerating the model while keeping its accuracy. Targeted at improving this component, the time-consuming 3D convolution network is replaced with a novel residual-like architecture, where features are mainly digested by a lightweight 2D BEV convolution network and compensated by integrating the 3D voxel features interpolated from the original image features. Experiments on the Occ3D-nuScenes benchmark demonstrate that our FastOcc achieves state-of-the-art results with a fast inference speed.

JBHI Journal 2023 Journal Article

A Novel Analysis of Compound Muscle Action Potential Scan: Staircase Function Fitting and StairFit Motor Unit Number Estimation

  • Maoqi Chen
  • Zhiyuan Lu
  • Ya Zong
  • Xiaoyan Li
  • Ping Zhou

Compound muscle action potential (CMAP) scan provides a detailed stimulus-response curve for examination of neuromuscular disease. The objective of the study is to develop a novel CMAP scan analysis to extract motor unit number estimation (MUNE) and other physiological or diagnostic information. A staircase function was used as the basic mathematical model of the CMAP scan. An optimal staircase function fitting model was estimated for each given number of motor units, and the fitting model with the minimum number of motor units that meets a predefined error requirement was accepted. This yields MUNE as well as the spike amplitude and activation threshold of each motor unit that contributes to the CMAP scan. The significance of the staircase function fit was confirmed using simulated CMAP scans with different motor unit number (20, 50, 100 and 150) and baseline noise (1 μV, 5 μV and 10 μV) inputs, in terms of MUNE performance, repeatability, and the test-retest reliability. For experimental data, the average MUNE of the first dorsal interosseous muscle derived from the staircase function fitting was 57. 5 ± 26. 9 for the tested spinal cord injury subjects, which was significantly lower than 101. 2 ± 16. 9, derived from the control group (p < 0. 001). The staircase function fitting provides an appropriate approach to CMAP scan processing, yielding MUNE and other useful parameters for examination of motor unit loss and muscle fiber reinnervation.

ICRA Conference 2022 Conference Paper

CPGNet: Cascade Point-Grid Fusion Network for Real-Time LiDAR Semantic Segmentation

  • Xiaoyan Li
  • Gang Zhang
  • Hongyu Pan
  • Zhenhua Wang

LiDAR semantic segmentation essential for advanced autonomous driving is required to be accurate, fast, and easy-deployed on mobile platforms. Previous point-based or sparse voxel-based methods are far away from real-time applications since time-consuming neighbor searching or sparse 3D convolution are employed. Recent 2D projection-based methods, including range view and multi-view fusion, can run in real time, but suffer from lower accuracy due to information loss during the 2 $D$ projection. Besides, to improve the performance, previous methods usually adopt test time augmentation (TTA), which further slows down the inference process. To achieve a better speed-accuracy trade-off, we propose Cascade Point-Grid Fusion Network (CPGNet), which ensures both effectiveness and efficiency mainly by the following two techniques: 1) the novel Point-Grid (PG) fusion block extracts semantic features mainly on the 2D projected grid for efficiency, while summarizes both 2D and 3D features on 3D point for minimal information loss; 2) the proposed transformation consistency loss narrows the gap between the single-time model inference and TTA. The experiments on the SemanticKITTI and nuScenes benchmarks demonstrate that the CPGNet without ensemble models or TTA is comparable with the state-of-the-art RPVNet, while it runs 4. 7 times faster.

IJCAI Conference 2022 Conference Paper

PRNet: Point-Range Fusion Network for Real-Time LiDAR Semantic Segmentation

  • Xiaoyan Li
  • Gang Zhang
  • Tao Jiang
  • Xufen Cai
  • Zhenhua Wang

Accurate and real-time LiDAR semantic segmentation is necessary for advanced autonomous driving systems. To guarantee a fast inference speed, previous methods utilize the highly optimized 2D convolutions to extract features on the range view (RV), which is the most compact representation of the LiDAR point clouds. However, these methods often suffer from lower accuracy for two reasons: 1) the information loss during the projection from 3D points to the RV, 2) the semantic ambiguity when 3D points labels are assigned according to the RV predictions. In this work, we introduce an end-to-end point-range fusion network (PRNet) that extracts semantic features mainly on the RV and iteratively fuses the RV features back to the 3D points for the final prediction. Besides, a novel range view projection (RVP) operation is designed to alleviate the information loss during the projection to the RV, and a point-range convolution (PRConv) is proposed to automatically mitigate the semantic ambiguity during transmitting features from the RV back to 3D points. Experiments on the SemanticKITTI and nuScenes benchmarks demonstrate that the PRNet pushes the range-based methods to a new state-of-the-art, and achieves a better speed-accuracy trade-off.

TIST Journal 2016 Journal Article

Video Face Editing Using Temporal-Spatial-Smooth Warping

  • Xiaoyan Li
  • Tongliang Liu
  • Jiankang Deng
  • Dacheng Tao

Editing faces in videos is a popular yet challenging task in computer vision and graphics that encompasses various applications, including facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation. Directly applying the existing warping methods to video face editing has the major problem of temporal incoherence in the synthesized videos, which cannot be addressed by simply employing face tracking techniques or manual interventions, as it is difficult to eliminate the subtly temporal incoherence of the facial feature point localizations in a video sequence. In this article, we propose a temporal-spatial-smooth warping (TSSW) method to achieve a high temporal coherence for video face editing. TSSW is based on two observations: (1) the control lattices are critical for generating warping surfaces and achieving the temporal coherence between consecutive video frames, and (2) the temporal coherence and spatial smoothness of the control lattices can be simultaneously and effectively preserved. Based upon these observations, we impose the temporal coherence constraint on the control lattices on two consecutive frames, as well as the spatial smoothness constraint on the control lattice on the current frame. TSSW calculates the control lattice (in either the horizontal or vertical direction) by updating the control lattice (in the corresponding direction) on its preceding frame, i.e., minimizing a novel energy function that unifies a data-driven term, a smoothness term, and feature point constraints. The contributions of this article are twofold: (1) we develop TSSW, which is robust to the subtly temporal incoherence of the facial feature point localizations and is effective to preserve the temporal coherence and spatial smoothness of the control lattices for editing faces in videos, and (2) we present a new unified video face editing framework that is capable for improving the performances of facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation.