Author name cluster

Xiaolin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

AAAI Conference 2026 Conference Paper

CLM-Access: A Specialized Foundation Model for High-Dimensional Single-Cell ATAC-Seq Analysis

Ziqiang Liu
Bowen Li
Zhenyu Xu
Yantao Li
Junwei Zhang
Chulin Sha
Xiaolin Li

Inspired by the success of large language models (LLMs) in natural language processing, cell language models (CLMs) have emerged as a promising paradigm to learn cell representations from high-dimensional single-cell data—particularly transcriptomic profiles from scRNA-seq. These foundation models have shown remarkable potential across a variety of downstream applications. However, there remains a lack of foundation models for scATAC-seq data, which measures chromatin accessibility at single-cell level and is critical for decoding epigenetic regulation. Developing such model is considerably more challenging due to the unique characteristics of scATAC-seq data, including the vast number of chromatin regions, lack of standardized annotations, extreme sparsity, and near-binary distributions. To address these challenges, we systematically explore various strategies and propose CLM-Access, a specialized foundation model for scATAC-seq data. CLM-Access incorporates three main innovations: (1) an unified data processing pipeline that maps 2.8 million cells onto an unified reference of over 1 million chromatin regions; (2) a specialized patching and embedding strategy to effectively manage high-dimensional inputs; and (3) a tailored masking and loss function design that preserves fine-grained regional information while enhancing training efficiency and representation quality. With comprehensive benchmarks, we show that CLM-Access significantly outperforms existing methods in key downstream tasks, including batch effect correction, cell type annotation, RNA expression prediction, and multi-modal integration. This work establishes a scalable and interpretable foundation model for single-cell epigenomic analysis and expands the application of CLMs in single-cell research.

PDF Details DOI

AAAI Conference 2025 Conference Paper

QiMLP: Quantum-inspired Multilayer Perceptron with Strong Correlation Mining and Parameter Compression

Junwei Zhang
Tianheng Wang
Zeyi Zhang
Pengju Yan
Xiaolin Li

Multilayer Perceptron (MLP) is a simple practice of Neural Network (NN) and the cornerstone of research and development of deep learning. Each neuron is connected to all neurons in the previous layer and implements a non-linear mapping through activation functions. MLP can learn complex non-linear relationships among features through the superposition of multiple hidden layers, but it still cannot discover the inherent strong correlation among features. The reason is that each neuron uses a simple weighted summation method to organize all the neurons in the previous layer. Inspired by quantum theory, this paper builds a non-linear NN layer that can mine strong correlations among features based on multi-body quantum systems, and then constructs a multi-layer perceptron, called Quantum-inspired MLP (QiMLP). It is conceivable that QiMLP will have important inspirational significance in reshaping machine learning, deep learning and large language models. We theoretically analyzed the basis for QiMLP to mine strong correlations among features, and implemented experiments on multiple classic deep learning datasets. Experimental results verify that QiMLP not only learns strong correlations among features, but also significantly reduces the number of parameters with hundreds of times improvement.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Quantum-inspired Non-homologous Representation Constraint Mechanism for Long-tail Senses of Word Sense Disambiguation

Junwei Zhang
Xiaolin Li

Word Sense Disambiguation (WSD) aims to determine the meaning of target words according to the given context. The recognition of high-frequency senses has reached expectations, and the current research focus is mainly on low-frequency senses, namely Long-tail Senses (LTSs). One of the challenges in long-tail WSD is to obtain clear and distinguishable definition representations based on limited word sense definitions. Researchers try to mine word sense definition information from data from different sources to enhance the representations. Inspired by quantum theory, this paper provides a constraint mechanism for representations under non-homogeneous data to leverage the geometric relationship in its Hilbert space to constrain the value range of parameters, thereby alleviating the dependence on big data and improving the accuracy of representations. We theoretically analyze the feasibility of the constraint mechanism, and verify the WSD system based on this mechanism on the standard evaluation framework, constructed LTS datasets and cross-lingual datasets. Experimental results demonstrate the effectiveness of the scheme and achieve competitive performance.

PDF Details DOI

JBHI Journal 2024 Journal Article

A Coarse-Fine Collaborative Learning Model for Three Vessel Segmentation in Fetal Cardiac Ultrasound Images

Shan Ling
Laifa Yan
Rongsong Mao
Jizhou Li
Haoran Xi
Fei Wang
Xiaolin Li
Min He

Congenital heart disease (CHD) is the most frequent birth defect and a leading cause of infant mortality, emphasizing the crucial need for its early diagnosis. Ultrasound is the primary imaging modality for prenatal CHD screening. As a complement to the four-chamber view, the three-vessel view (3VV) plays a vital role in detecting anomalies in the great vessels. However, the interpretation of fetal cardiac ultrasound images is subjective and relies heavily on operator experience, leading to variability in CHD detection rates, particularly in resource-constrained regions. In this study, we propose an automated method for segmenting the pulmonary artery, ascending aorta, and superior vena cava in the 3VV using a novel deep learning network named CoFi-Net. Our network incorporates a coarse-fine collaborative strategy with two parallel branches dedicated to simultaneous global localization and fine segmentation of the vessels. The coarse branch employs a partial decoder to leverage high-level semantic features, enabling global localization of objects and suppression of irrelevant structures. The fine branch utilizes attention-parameterized skip connections to improve feature representations and improve boundary information. The outputs of the two branches are fused to generate accurate vessel segmentations. Extensive experiments conducted on a collected dataset demonstrate the superiority of CoFi-Net compared to state-of-the-art segmentation models for 3VV segmentation, indicating its great potential for enhancing CHD diagnostic efficiency in clinical practice. Furthermore, CoFi-Net outperforms other deep learning models in breast lesion segmentation on a public breast ultrasound dataset, despite not being specifically designed for this task, demonstrating its potential and robustness for various segmentation tasks.

Details DOI

IJCAI Conference 2020 Conference Paper

Exploiting Mutual Information for Substructure-aware Graph Representation Learning

Pengyang Wang
Yanjie Fu
Yuanchun Zhou
Kunpeng Liu
Xiaolin Li
Kien Hua

In this paper, we design and evaluate a new substructure-aware Graph Representation Learning (GRL) approach. GRL aims to map graph structure information into low-dimensional representations. While extensive efforts have been made for modeling global and/or local structure information, GRL can be improved by substructure information. Some recent studies exploit adversarial learning to incorporate substructure awareness, but hindered by unstable convergence. This study will address the major research question: is there a better way to integrate substructure awareness into GRL? As subsets of the graph structure, interested substructures (i. e. , subgraph) are unique and representative for differentiating graphs, leading to the high correlation between the representation of the graph-level structure and substructures. Since mutual information (MI) is to evaluate the mutual dependence between two variables, we develop a MI inducted substructure-aware GRL method. We decompose the GRL pipeline into two stages: (1) node-level, where we introduce to maximize MI between the original and learned representation by the intuition that the original and learned representation should be highly correlated; (2) graph-level, where we preserve substructures by maximizing MI between the graph-level structure and substructure representation. Finally, we present extensive experimental results to demonstrate the improved performances of our method with real-world data.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Improving Question Generation with Sentence-Level Semantic Matching and Answer Position Inferring

Xiyao Ma
Qile Zhu
Yanlin Zhou
Xiaolin Li

Taking an answer and its context as input, sequence-tosequence models have made considerable progress on question generation. However, we observe that these approaches often generate wrong question words or keywords and copy answer-irrelevant words from the input. We believe that lacking global question semantics and exploiting answer positionawareness not well are the key root causes. In this paper, we propose a neural question generation model with two general modules: sentence-level semantic matching and answer position inferring. Further, we enhance the initial state of the decoder by leveraging the answer-aware gated fusion mechanism. Experimental results demonstrate that our model outperforms the state-of-the-art (SOTA) models on SQuAD and MARCO datasets. Owing to its generality, our work also improves the existing models signiﬁcantly.

PDF Details

AAAI Conference 2020 Conference Paper

Rethinking the Bottom-Up Framework for Query-Based Video Localization

Long Chen
Chujie Lu
Siliang Tang
Jun Xiao
Dong Zhang
Chilie Tan
Xiaolin Li

In this paper, we focus on the task query-based video localization, i. e. , localizing a query in a long and untrimmed video. The prevailing solutions for this problem can be grouped into two categories: i) Top-down approach: It pre-cuts the video into a set of moment candidates, then it does classiﬁcation and regression for each candidate; ii) Bottom-up approach: It injects the whole query content into each video frame, then it predicts the probabilities of each frame as a ground truth segment boundary (i. e. , start or end). Both two frameworks have respective shortcomings: the top-down models suffer from heavy computations and they are sensitive to the heuristic rules, while the performance of bottom-up models is behind the performance of top-down counterpart thus far. However, we argue that the performance of bottom-up framework is severely underestimated by current unreasonable designs, including both the backbone and head network. To this end, we design a novel bottom-up model: Graph-FPN with Dense Predictions (GDP). For the backbone, GDP ﬁrstly generates a frame feature pyramid to capture multi-level semantics, then it utilizes graph convolution to encode the plentiful scene relationships, which incidentally mitigates the semantic gaps in the multi-scale feature pyramid. For the head network, GDP regards all frames falling in the ground truth segment as the foreground, and each foreground frame regresses the unique distances from its location to bi-directional boundaries. Extensive experiments on two challenging query-based video localization tasks (natural language video localization and video relocalization), involving four challenging benchmarks (TACoS, Charades-STA, ActivityNet Captions, and Activity- VRL), have shown that GDP surpasses the state-of-the-art top-down models.

PDF Details

AAAI Conference 2019 Conference Paper

Efficient Region Embedding with Multi-View Spatial Networks: A Perspective of Locality-Constrained Spatial Autocorrelations

Yanjie Fu
Pengyang Wang
Jiadi Du
Le Wu
Xiaolin Li

Urban regions are places where people live, work, consume, and entertain. In this study, we investigate the problem of learning an embedding space for regions. Studying the representations of regions can help us to better understand the patterns, structures, and dynamics of cities, support urban planning, and, ultimately, to make our cities more livable and sustainable. While some efforts have been made for learning the embeddings of regions, existing methods can be improved by incorporating locality-constrained spatial autocorrelations into an encode-decode framework. Such embedding strategy is capable of taking into account both intra-region structural information and inter-region spatial autocorrelations. To this end, we propose to learn the representations of regions via a new embedding strategy with awareness of locality-constrained spatial autocorrelations. Specifically, we first construct multi-view (i. e. , distance and mobility connectivity) POI-POI networks to represent regions. In addition, we introduce two properties into region embedding: (i) spatial autocorrelations: a global similarity between regions; (ii) top-k locality: spatial autocorrelations locally and approximately reside on top k most autocorrelated regions. We propose a new encoder-decoder based formulation that preserves the two properties while remaining efficient. As an application, we exploit the learned embeddings to predict the mobile checkin popularity of regions. Finally, extensive experiments with real-world urban region data demonstrate the effectiveness and efficiency of our method.

PDF Details

AAAI Conference 2019 Conference Paper

Generalized Batch Normalization: Towards Accelerating Deep Neural Networks

Xiaoyong Yuan
Zheng Feng
Matthew Norton
Xiaolin Li

Utilizing recently introduced concepts from statistics and quantitative risk management, we present a general variant of Batch Normalization (BN) that offers accelerated convergence of Neural Network training compared to conventional BN. In general, we show that mean and standard deviation are not always the most appropriate choice for the centering and scaling procedure within the BN transformation, particularly if ReLU follows the normalization step. We present a Generalized Batch Normalization (GBN) transformation, which can utilize a variety of alternative deviation measures for scaling and statistics for centering, choices which naturally arise from the theory of generalized deviation measures and risk theory in general. When used in conjunction with the ReLU non-linearity, the underlying risk theory suggests natural, arguably optimal choices for the deviation measure and statistic. Utilizing the suggested deviation measure and statistic, we show experimentally that training is accelerated more so than with conventional BN, often with improved error rate as well. Overall, we propose a more flexible BN transformation supported by a complimentary theoretical framework that can potentially guide design choices.

PDF Details

TIST Journal 2018 Journal Article

A Multi-Label Multi-View Learning Framework for In-App Service Usage Analysis

Yanjie Fu
Junming Liu
Xiaolin Li
Hui Xiong

The service usage analysis, aiming at identifying customers’ messaging behaviors based on encrypted App traffic flows, has become a challenging and emergent task for service providers. Prior literature usually starts from segmenting a traffic sequence into single-usage subsequences, and then classify the subsequences into different usage types. However, they could suffer from inaccurate traffic segmentations and mixed-usage subsequences. To address this challenge, we exploit a multi-label multi-view learning strategy and develop an enhanced framework for in-App usage analytics. Specifically, we first devise an enhanced traffic segmentation method to reduce mixed-usage subsequences. Besides, we develop a multi-label multi-view logistic classification method, which comprises two alignments. The first alignment is to make use of the classification consistency between packet-length view and time-delay view of traffic subsequences and improve classification accuracy. The second alignment is to combine the classification of single-usage subsequence and the post-classification of mixed-usage subsequences into a unified multi-label logistic classification problem. Finally, we present extensive experiments with real-world datasets to demonstrate the effectiveness of our approach. We find that the proposed multi-label multi-view framework can help overcome the pain of mixed-usage subsequences and can be generalized to latent activity analysis in sequential data, beyond in-App usage analytics.

Details DOI

TIST Journal 2018 Journal Article

Learning Urban Community Structures

Pengyang Wang
Yanjie Fu
Jiawei Zhang
Xiaolin Li
Dan Lin

Learning urban community structures refers to the efforts of quantifying, summarizing, and representing an urban community’s (i) static structures, e.g., Point-Of-Interests (POIs) buildings and corresponding geographic allocations, and (ii) dynamic structures, e.g., human mobility patterns among POIs. By learning the community structures, we can better quantitatively represent urban communities and understand their evolutions in the development of cities. This can help us boost commercial activities, enhance public security, foster social interactions, and, ultimately, yield livable, sustainable, and viable environments. However, due to the complex nature of urban systems, it is traditionally challenging to learn the structures of urban communities. To address this problem, in this article, we propose a collective embedding framework to learn the community structure from multiple periodic spatial-temporal graphs of human mobility. Specifically, we first exploit a probabilistic propagation-based approach to create a set of mobility graphs from periodic human mobility records. In these mobility graphs, the static POIs are regarded as vertexes, the dynamic mobility connectivities between POI pairs are regarded as edges, and the edge weights periodically evolve over time. A collective deep auto-encoder method is then developed to collaboratively learn the embeddings of POIs from multiple spatial-temporal mobility graphs. In addition, we develop a Unsupervised Graph based Weighted Aggregation method to align and aggregate the POI embeddings into the representation of the community structures. We apply the proposed embedding framework to two applications (i.e., spotting vibrant communities and predicting housing price return rates) to evaluate the performance of our proposed method. Extensive experimental results on real-world urban communities and human mobility data demonstrate the effectiveness of the proposed collective embedding framework.

Details DOI

IROS Conference 2005 Conference Paper

Analytical fault detection and diagnosis (FDD) for pneumatic systems in robotics and manufacturing automation

Xiaolin Li
Imin Kao

Pneumatic systems are often found in manufacturing floors for automation and robotic systems. Early and intelligent faults detection and diagnosis (FDD) of such systems can prevent failure of devices that causes shutdown and loss of precious production time and profits. In this paper, we introduce analytical FDD for pneumatic systems. The diagnosis system presented in this paper focuses on the signal-based approach which employs multi-resolution wavelet decomposition of various sensor signals such as pressure, flow rate, etc. , to determine leak configuration. Pattern recognition technique and analytical vectorized maps are developed to diagnose an unknown leakage based on the established FDD information using affine mapping. Experimental studies and analysis are presented to illustrate the FDD system.

Details