Author name cluster

Xiaolong Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers

1 author row

AAAI Conference 2026 Conference Paper

Exploring High-order-aware Prompt Learning for Zero-shot Anomaly Detection

Shun Wei
Jielin Jiang
Xiaolong Xu

Many methods have demonstrated promising results in zero-shot anomaly detection (ZSAD) by incorporating prompt learning (PL) to fine-tune Vision-Language Models. However, the prompt learners proposed in recent studies remain relatively simple, such as learnable textual and visual prompts. Relying solely on the current PL paradigm restricts the ability to generate more precise prompts, thereby hindering improved ZSAD performance. To mitigate this issue, this paper proposes a high-order-aware prompt learning framework, termed HiPL, which facilitates the detection of unseen anomalies through generating prompts fortified by hypergraphs. Specifically, HiPL models high-order correlations among patches through a dynamically constructed hypergraph structure. Then we leverage a hypergraph semantic convolution to capture potential collaborative information by propagating high-order correlations by hyperedges. Meanwhile, HiPL introduces a Mixture-of-Experts prompt learner (MoEPLer), where the experts within MoEPLer can generate multiple distinct prompts based on the modeled high-order correlations. Then, the final high-order-aware textual prompts can be formed by synthetically considering each expert's prompt by gating weights. This enables a comprehensive understanding of potential anomalous patterns, thereby facilitating ZSAD performance. Large-scale experiments conducted on 12 datasets, spanning natural, industrial, and medical domains, demonstrate the validity of proposed HiPL.

PDF Details DOI

AAAI Conference 2026 Conference Paper

IdeFN: Identifying Unclicked Space False Negatives via Relaxed Partial Optimal Transport for Conversion Rate Prediction

Weiyi Zhong
Weiming Liu
Lianyong Qi
Xiaoran Zhao
Xiaolong Xu
Haolong Xiang
Yang Cao
Shichao Pei

Accurate conversion rate (CVR) prediction is critical for recommender systems to capture user conversion intent and increase platform revenues. Traditional CVR models commonly suffer from sample selection bias (SSB) and data sparsity (DS), which has led to the adoption of click-through & conversion rate (CTCVR) multi-task learning frameworks to alleviate these issues. However, existing methods implicitly mislabel some unclicked samples with genuine conversion potential as negatives, thereby exacerbating the false negative sample (FNS) problem. To address this, we propose IdeFN, a multi‑task CVR framework that identifies false negatives in the unclicked space to enable CVR prediction across the entire exposure space and leverages CTR as an auxiliary task for shared‑parameter learning. Specifically, IdeFN consists of two main components, i.e., relaxed partial optimal transport (RPOT) module and sample relabeling mechanism (SRM). The former estimates the soft matching strengths between unclicked samples and positive samples under a relaxed partial optimal transport formulation, establishing corresponding relationships between these samples. The latter adaptively re-labels the unclicked samples according to the derived matching strengths, without relying on static or heuristic thresholds, thus enhancing the reliability of the generated pseudo-labels. Experimental results demonstrate that IdeFN effectively mitigates the FNS problem, achieving substantial improvements in CVR prediction accuracy.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion

Haolong Xiang
Peisi Wang
Xiaolong Xu
Kun Yi
Xuyun Zhang
Quan Z. Sheng
Amin Beheshti
Wei Fan

With rapid urbanization in the modern era, traffic signals from various sensors have been playing a significant role in monitoring the states of cities, which provides a strong foundation in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic time series modeling often rely on the original data modality, i.e., numerical direct readings from the sensors in cities. However, this unimodal approach overlooks the semantic information existing in multimodal heterogeneous urban data in different perspectives, which hinders a comprehensive understanding of traffic signals and limits the accurate prediction of complex traffic dynamics. To address this problem, we propose a novel Multimodal framework, MTP, for urban Traffic Profiling, which learns multimodal features through numeric, visual, and textual perspectives in the frequency domain. The three branches drive a multimodal perspective of traffic signal learning for augmentation, while the frequency learning strategies delicately refine the information for extraction. Specifically, we first conduct the visual augmentation for the traffic time series, which transforms the original modality into periodicity images and frequency images for visual learning. Also, we augment descriptive texts for the traffic time series based on the specific topic, background information and item description for textual learning. To complement the numeric information, we utilize frequency multilayer perceptrons for learning on the original modality. We design a hierarchical contrastive learning on the three branches to fuse the three modalities. Finally, extensive experiments on six real-world datasets demonstrate superior performance compared with the state-of-the-art approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Talon: Breaking the Synchronization Barrier in Speculative Decoding with Hybrid Model-based and Retrieve-based Drafting

Xiangxiang Gao
Weisheng Xie
Lixin
Xuwei Fang
Chen Hang
Changqun Li
Yuhan Lin
Xiaolong Xu

Large Language Models face fundamental deployment challenges due to the computational demands of auto-regressive token-by-token generation. While speculative decoding has emerged as a promising acceleration technique through its draft-then-verify framework, current implementations suffer from two critical limitations: (1) mutual waiting problem caused by sequential dependencies between draft generation and verification phases, and (2) constrained token acceptance rates where retrieval-based drafting methods under-perform in general domains while models-based drafting approaches show reduced efficacy in knowledge-intensive scenarios. To address these challenges, we propose Talon, a novel parallel inference architecture featuring two key innovations: (1) **a novel asynchronous execution paradigm** that decouples draft generation from verification, effectively eliminating synchronization bottlenecks, and (2) **an adaptive hybrid drafting strategy** that dynamically combines model-based and retrieval-based approaches to improve token acceptance rates across diverse domains. Extensive evaluations across standard benchmarks (MT-Bench, HumanEval, GSM8K, Alpaca, CNN/DM) demonstrate Talon's exceptional performance, achieving 4.04x–6.52x acceleration across multiple model families including Vicuna, Deepseek, and LLaMA series. These results represent a significant advancement over existing speculative decoding methods (EAGLE 1-3, Hydra, Medusa, Lookahead, SPS, and PLD), establishing a new paradigm for speculative decoding.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

A Fair Federated Learning Method for Handling Client Participation Probability Inconsistencies in Heterogeneous Environments

Siyuan Wu
Yongzhe Jia
Haolong Xiang
Xiaolong Xu
Xuyun Zhang
Lianyong Qi
Wanchun Dou

Federated learning (FL) is a distributed machine learning paradigm that enables multiple clients to collaboratively train a shared model without exposing their raw data. However, existing FL research has primarily focused on optimizing learning performance based on the assumption of uniform client participation, with few studies delving into performance fairness under inconsistent client participation, particularly in model-heterogeneous FL environments. In view of this challenge, we propose PHP-FL, a novel model-heterogeneous FL method that explicitly addresses scenarios with varying client participation probabilities to enhance both model accuracy and performance fairness. Speciﬁcally, we introduce a Dual-End Aligned ensemble Learning (DEAL) module, where small auxiliary models on clients are used for dual-end knowledge alignment and local ensemble learning, effectively tackling model heterogeneity without a public dataset. Furthermore, to mitigate update conﬂicts caused by inconsistent participation probabilities, we propose an Importance-driven Selective Parameter Update (ISPU) module, which accurately updates critical local parameters based on training progress. Finally, we implement PHP-FL on a lightweight FL platform with heterogeneous clients across three different client participation patterns. Extensive experiments under heterogeneous settings and diverse client participation patterns demonstrate that PHP-FL achieves state-of-the-art performance in both accuracy and fairness. Our code is available at: https: //github. com/Siyuan01/PHP-FL-main.

EAAI Journal 2025 Journal Article

A multi-sensor deep metric learning method for unknown anomaly detection in unmanned aerial vehicles

Jinbo Zhao
Xiaolong Xu

IJCAI Conference 2025 Conference Paper

Balancing User-Item Structure and Interaction with Large Language Models and Optimal Transport for Multimedia Recommendation

Haodong Li
Lianyong Qi
Weiming Liu
Xiaolong Xu
Wanchun Dou
Yang Cao
Xuyun Zhang
Amin Beheshti

The rapid growth of multimedia content has driven the development of recommender systems. Most previous work focuses on uncovering latent relationships among items to learn better representations. However, this approach does not sufficiently account for user affinities, potentially leading to an imbalance in the structure modeling of users and items. Moreover, the sparsity and imbalance of user-item interactions further hinder effective representation learning. To address these challenges, we propose a framework called BLAST, which balances structures and interactions via large language models and optimal transport for multimodal recommendation. Specifically, we utilize large language models to summarize side information and generate user profiles. Based on these profiles, we design an intra- and inter-entity structure balancing module to capture item-item and user-user relationships, integrating these affinities into the final representations. Furthermore, we impose constraints on negative sample selection, augment the training data with false negative items and the optimal transport algorithm, thereby leading to smoother interactions. We evaluate BLAST on three real-world datasets, and the results demonstrate that our method significantly outperforms state-of-the-art baselines, which validates the superiority and effectiveness of BLAST.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

CLLMRec: Contrastive Learning with LLMs-based View Augmentation for Sequential Recommendation

Fan Lu
Xiaolong Xu
Haolong Xiang
Lianyong Qi
Xiaokang Zhou
Fei Dai
Wanchun Dou

Sequential recommendation generates embedding representations from historical user-item interactions to recommend the next potential interaction item. Due to the complexity and variability of historical user-item interactions, extracting effective user features is quite challenging. Recent studies have employed sequential networks such as time series networks and Transformers to capture the intricate dependencies and temporal patterns in historical user-item interactions, extracting more effective user features. However, limited by the scarcity and suboptimal quality of data, these methods struggle to capture subtle differences in user sequences, which results in diminished recommendation accuracy. To address the above issue, we propose a contrastive learning framework with LLMs-based view augmentation (CLLMRec), which effectively mines differences in behavioral sequences through sample generation. Specifically, CLLMRec utilizes LLMs (Large Language Models) to augment views and expand user behavior sequence representations, providing high-quality positive and negative samples. Subsequently, CLLMRec employs the augmented views for effective contrastive learning, capturing subtle differences in behavioral sequences to suppress interference from irrelevant noise. Experimental results on three public datasets demonstrate that the proposed method outperforms state-of-the-art baseline models, and significantly enhances recommendation performance.

PDF Details DOI

AAAI Conference 2025 Conference Paper

DivGCL: A Graph Contrastive Learning Model for Diverse Recommendation

Wenwen Gong
Yangliao Geng
Dan Zhang
Yifan Zhu
Xiaolong Xu
Haolong Xiang
Amin Beheshti
Xuyun Zhang

Graph Contrastive Learning (GCL), as a primary paradigm of graph self-supervised learning, spurs a fruitful line of research in tackling the data sparsity issue by maximizing the consistency of user/item embeddings between different augmented views with random perturbations. However, diversity, as a crucial metric for recommendation performance and user satisfaction, has received rather little attention. In fact, there exists a challenging dilemma in balancing accuracy and diversity. To address these issues, we propose a new Graph Contrastive Learning (DivGCL) model for diversifying recommendations. Inspired by the excellence of the determinant point process (DPP), DivGCL adopts a DPP likelihood-based loss function to achieve an ideal trade-off between diversity and accuracy, optimizing it jointly with the advanced Gaussian noise-augmented GCL objective. Extensive experiments on four popular datasets demonstrate that DivGCL surpasses existing approaches in balancing accuracy and diversity, with an improvement of 23.47% at T@20 (abbreviation for trade-off metric) on ML-1M.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Empowering Multimodal Road Traffic Profiling with Vision Language Models and Frequency Spectrum Fusion

Haolong Xiang
Xiaolong Xu
Guangdong Wang
Xuyun Zhang
Xiaoyong Li
Qi Zhang
Amin Beheshti
Wei Fan

With the rapid urbanization in the modern era, smart traffic profiling based on multimodal sources of data has been playing a significant role in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic profiling on the road level usually utilize single-modality data, i. e. , they mainly focus on image processing with deep vision models or auxiliary analysis on the textual data. However, the joint modeling and multimodal fusion of the textual and visual modalities have been rarely studied in road traffic profiling, which largely hinders the accurate prediction or classification of traffic conditions. To address this issue, we propose a novel multimodal learning and fusion framework for road traffic profiling, named TraffiCFUS. Specifically, given the traffic images, our TraffiCFUS framework first introduces Vision Language Models (VLMs) to generate text and then creates tailored prompt instructions for refining this text according to the specific scene requirements of road traffic profiling. Next, we apply the discrete Fourier transform to convert multimodal data from the spatial domain to the frequency domain and perform a cross-modal spectrum transform to filter out irrelevant information for traffic profiling. Furthermore, the processed spatial multimodal data is combined to generate fusion loss and interaction loss with contrastive learning. Finally, extensive experiments on four real-world datasets illustrate superior performance compared with the state-of-the-art approaches.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Enhancing Diffusion Model with Auxiliary Information Mining-Exploration and Efficient Sampling Mechanism for Sequential Recommendation

Te Song
Lianyong Qi
Weiming Liu
Fan Wang
Xiaolong Xu
Xuyun Zhang
Amin Beheshti
Xiaokang Zhou

Sequential recommendation aims to capture the temporal dependencies of items in a user's historical interactions and make recommendations based on this. Previous generative methods addressed the issue of data not directly reflecting user preference uncertainty by modeling the distribution of latent item representations. Diffusion model (DM)-based methods have achieved significant success due to their high-quality generation and stable training. However, they lack satisfactory user sequence representations to guide the generation process, impacting recommendation performance. Moreover, these methods overlook the drawback of slow inference speed, severely limiting their practical value. To obtain effective generative guidance signals and accelerate the recommendation process, we propose DAE4Rec. In this approach, a Graph Auto-Encoder (GAE) is used to obtain interpretable item node representations, revealing global transitions of items that previous methods struggled to uncover. Then, we use it to construct a generative guidance signal with lower coupling and variance for the diffusion model. Additionally, by employing a non-Markov chain derived from the forward diffusion process, it is the first to implement a 'skip-step' reverse process in diffusion model-based methods. And a creatively designed compensator is used to bridge the performance gap caused by 'skip-step'. Extensive experiments on three real-world datasets demonstrate that DAE4Rec outperforms other state-of-the-art generative sequential recommenders.

PDF Details DOI

EAAI Journal 2025 Journal Article

Hierarchical reinforcement learning with curriculum demonstrations and goal-guided policies for sequential robotic manipulation

Zihao Sun
Bao Pang
Xianfeng Yuan
Xiaolong Xu
Yong Song
Rui Song
Yibin Li

NeurIPS Conference 2025 Conference Paper

HPSERec: A Hierarchical Partitioning and Stepwise Enhancement Framework for Long-tailed Sequential Recommendation

Xiaolong Xu
Xudong Zhao
Haolong Xiang
Xuyun Zhang
Wei Shen
Hongsheng Hu
Lianyong Qi

The long-tail problem in sequential recommender systems stems from imbalanced interaction data, resulting in suboptimal model performance for tail users and items. Recent studies have leveraged head data to enhance tail data for diminish the impact of the long-tail problem. However, these methods often adopt ad-hoc strategies to distinguish between head and tail data, which fails to capture the underlying distributional characteristics and structural properties of each category. Moreover, due to a substantial representational gap exists between head and tail data, head-to-tail enhancement strategies are susceptible to negative transfer, often leading to a decline in overall model performance. To address these issues, we propose a hierarchical partitioning and stepwise enhancement framework, called HPSERec, for long-tailed sequential recommendation. HPSERec partitions the item set into subsets based on a data imbalance metric, assigning an expert network to each subset to capture user-specific local features. Subsequently, we apply knowledge distillation to progressively improve long-tail interest representation, followed by a Sinkhorn optimal transport-based feedback module, which aligns user representations across expert levels through a globally optimal and softly matched mapping. Extensive experiments on three real-world datasets demonstrate that HPSERec consistently outperforms all baseline methods. The implementation code is available at https: //anonymous. 4open. science/r/HPSERec-2404.

IJCAI Conference 2025 Conference Paper

MEGAD: A Memory-Efficient Framework for Large-Scale Attributed Graph Anomaly Detection

Yifan Zhang
Haolong Xiang
Xiaolong Xu
Zishun Rui
Xiaoyong Li
Lianyong Qi
Fei Dai

Graph anomaly detection (GAD), with its ability to accurately identify anomalous patterns in graph data, plays a vital role in areas such as network security, social media platforms, and fraud detection. Graph autoencoder-based methods are widely used for GAD due to their efficiency and effectiveness in capturing complex patterns and learning meaningful representations. However, the above methods are constrained by hardware memory, hindering the detection for large-scale graph data. In this paper, we propose a Memory-Efficient framework for large-scale attributed Graph Anomaly Detection (MEGAD). Specifically, MEGAD first generates node embeddings and then refines them through a lightweight joint optimization model, ensuring minimal memory overhead. The optimized embeddings are subsequently fed into a detector to compute anomaly scores. Extensive experiments demonstrate that our framework achieves comparable accuracy to state-of-the-art methods across multiple datasets while significantly reducing memory consumption on large-scale graphs.

PDF Details DOI

TAAS Journal 2025 Journal Article

Multi-Agent Reinforcement Learning based Edge Content Caching for Connected Autonomous Vehicles in IoV

Xiaolong Xu
Linjie Gu
Muhammad Bilal
Maqbool Khan
Yiping Wen
Guoqiang Liu
Yuan Yuan

Connected Autonomous Vehicle (CAV) Driving, as a data-driven intelligent driving technology within the Internet of Vehicles (IoV), presents significant challenges to the efficiency and security of real-time data management. The combination of Web3.0 and edge content caching holds promise in providing low-latency data access for CAVs’ real-time applications. Web3.0 enables the reliable pre-migration of frequently requested content from content providers to edge nodes. However, identifying optimal edge node peers for joint content caching and replacement remains challenging due to the dynamic nature of traffic flow in IoV. Addressing these challenges, this article introduces GAMA-Cache, an innovative edge content caching methodology leveraging Graph Attention Networks (GAT) and Multi-Agent Reinforcement Learning (MARL). GAMA-Cache conceptualizes the cooperative edge content caching issue as a constrained Markov decision process. It employs a MARL technique predicated on cooperation effectiveness to discern optimal caching decisions, with GAT augmenting information extracted from adjacent nodes. A distinct collaborator selection mechanism is also developed to streamline communication between agents, filtering out those with minimal correlations in the vector input to the policy network. Experimental results demonstrate that, in terms of service latency and delivery failure, the GAMA-Cache outperforms other state-of-the-art MARL solutions for edge content caching in IoV.

AAAI Conference 2025 Conference Paper

NLGT: Neighborhood-based and Label-enhanced Graph Transformer Framework for Node Classification

Xiaolong Xu
Yibo Zhou
Haolong Xiang
Xiaoyong Li
Xuyun Zhang
Lianyong Qi
Wanchun Dou

Graph Neural Networks (GNNs) are widely applied on graph-level tasks, such as node classification, link prediction and graph generation. Existing GNNs mostly adopt a message-passing mechanism to aggregate node information with their neighbors, which often makes node information similar after rounds of aggregations and leads to oversmoothing. Although recent works have made improvements by combining different message aggregation methods or introducing semantic encodings as priors, these message-passing based GNNs still fail to combat oversmoothing after multiple iterations of node aggregation. Besides, the feature extraction ability of these methods is restricted because of the graph sparsity that hinders the aggregation of node information. To deal with the above two issues, we propose Neighborhood-based and Label-enhanced Graph Transformer (NLGT), a novel and effective framework for graph learning. Specifically, we present a label-enhanced feature fusion mechanism that integrate the shallow node features and label embeddings as enhanced features. Moreover, we design a neighborhood-based mask attention mechanism to alleviate the negative effects caused by the sparsity of the graph. In the predicting stage, we aggregate the prediction results from multiple sampled sub-graphs and apply voting mechanisms to enhance the accuracy and robustness of our framework. Finally, extensive experiments are conducted on four open benchmark datasets, which demonstrate the effectiveness and robustness of our proposed framework compared with existing state-of-the-art methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

PFedCS: A Personalized Federated Learning Method for Enhancing Collaboration among Similar Classifiers

Siyuan Wu
Yongzhe Jia
Bowen Liu
Haolong Xiang
Xiaolong Xu
Wanchun Dou

Personalized federated learning (PFL) has recently gained significant attention for its capability to address the poor convergence performance on highly heterogeneous data and the lack of personalized solutions of traditional federated learning (FL). Existing mainstream approaches either perform personalized aggregation based on a specific model architecture to leverage global knowledge or achieve personalization by exploiting client similarities. However, the former overlooks the discrepancies in client data distributions by indiscriminately aggregating all clients, while the latter lacks fine-grained collaboration of classifiers relevant to local tasks. In view of this challenge, we propose a Personalized Federated learning method for Enhancing Collaboration among Similar Classifiers (PFedCS), which aims at improving the client’s accuracy on local tasks. Concretely, it is achieved by leveraging awareness of the client classifier similarities to address the above problems. By iteratively measuring the distance of the classifier parameters between clients and clustering with each client as a cluster center, the central server adaptively identifies the collaborating clients with similar data distributions. In addition, a distance-constrained aggregation method is designed to generate customized collaborative classifiers to guide local training. As a result, extensive experimental evaluations conducted on three datasets demonstrate that our method achieves state-of-the-art performance.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Representation Learning Based Predicate Invention on Knowledge Graphs

Man Zhu
Pengfei Huang
Lei Gu
Xiaolong Xu
Jingyu Han

The recognition of whether or not a predicate should be invented is an important problem in the domain of predicate invention. Despite its significance, existing research has yet to fully harness the rich data available in knowledge graphs. In this paper, we introduce a novel problem formulation, ReLPI (Representation Learning for Predicate Invention in Knowledge Graphs), marking a pioneering effort in this domain. To address the core issues of ReLPI, we devise a scoring function that informs the learning process. By optimizing embeddings towards this scoring function, we endow them with semantic meaning, crucial for capturing the nuances of predicate presence patterns. Furthermore, we present SEmPI (Semantic Embeddings for Predicate Invention), a framework that leverages predicate (relation) embeddings as a trainable medium. SEmPI uncovers latent patterns governing predicate occurrences in knowledge graphs, enabling the invention of novel predicates grounded in these discovered patterns. This approach represents a significant step forward in leveraging data-driven methods for predicate invention in knowledge graphs. We evaluate the proposed approach on FB15k and DRKG datasets, and the results demonstrate the effectiveness of SEmPI in discovering new predicates.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Universal Backdoor Defense via Label Consistency in Vertical Federated Learning

Peng Chen
Haolong Xiang
Xin Du
Xiaolong Xu
Xuhao Jiang
Zhihui Lu
Jirui Yang
Qiang Duan

Backdoor attacks in vertical federated learning (VFL) are particularly concerning as they can covertly compromise VFL decision-making, posing a severe threat to critical applications of VFL. Existing defense mechanisms typically involve either label obfuscation during training or model pruning during inference. However, the inherent limitations on the defender's access to the global model and complete training data in VFL environments fundamentally constrain the effectiveness of these conventional methods. To address these limitations, we propose the Universal Backdoor Defense (UBD) framework. UBD leverages Label Consistent Clustering (LCC) to synthesize plausible latent triggers associated with the backdoor class. This synthesized information is then utilized for mitigating backdoor threats through Linear Probing (LP), guided by a constraint on Batch Normalization (BN) statistics. Positioned within a unified VFL backdoor defense paradigm, UBD offers a generalized framework for both detection and mitigation that critically does not necessitate access to the entire model or dataset. Extensive experiments across multiple datasets rigorously demonstrate the efficacy of the UBD framework, achieving state-of-the-art performance against diverse backdoor attack types in VFL, including both dirty-label and clean-label variants.

PDF Details DOI

JBHI Journal 2025 Journal Article

ViResGF-Net: Gated Multi-Scale Hybrid Vision Transformer for Robust Fundus Image Multi-Label Classification

Binghan Chen
Haolong Xiang
Jiayi Wan
Muhammad Bilal
Xiaolong Xu

With the acceleration of the global population aging process, fundus diseases such as cataracts and glaucoma have become major factors leading to visual impairment. In the field of ophthalmic diagnosis, the traditional diagnosis and treatment mode mainly relies on doctors to make pathological judgments by observing fundus images with the naked eye. However, due to differences in evaluation standards among doctors, there are discrepancies in diagnostic results for the same fundus photo. In addition, most doctors only specialize in specific fundus diseases, making it difficult to accurately diagnose cases where multiple diseases coexist. To tackle the aforementioned issues, this paper proposes a gated multi-scale hybrid vision Transformer model, designated as ViResGF-Net, for the multi-class classification of fundus diseases. The model integrates the dual-branch structure of Convolutional Neural Network (CNN) and Vision Transformer (ViT). While retaining the global modeling capability of ViT, it performs local feature extraction through the CNN branch and introduces the Feature Pyramid Network (FPN) structure to further enhance the local feature extraction capability of the CNN branch. In the feature fusion stage, a Gated Fusion Unit (GFU) module is added to fuse the feature vectors of the two branches. Finally, the MLP classifier gives the prediction results based on the integrated feature vectors. Through extensive experiments, our model achieved an accuracy of 93. 56%, a precision of 92. 99%, and an F1 score of 92. 36%, all of which are better than those of other models.

IJCAI Conference 2025 Conference Paper

Where Does This Data Come From? Enhanced Source Inference Attacks in Federated Learning

Haiyang Chen
Xiaolong Xu
Xiang Zhu
Xiaokang Zhou
Fei Dai
Yansong Gao
Xiao Chen
Shuo Wang

Federated learning (FL) enables collaborative model training without exposing raw data, offering a privacy-aware alternative to centralized learning. However, FL remains vulnerable to various privacy attacks that exploit shared model updates, including membership inference, property inference, and gradient inversion. Source inference attacks further threaten FL by identifying which client contributed a specific training sample, posing severe risks to user and institutional privacy. Existing source inference attacks mainly assume passive adversaries and overlook more realistic scenarios where the server actively manipulates the training process. In this paper, we present an enhanced source inference attack that demonstrates how a malicious server can amplify behavioral differences between clients to more accurately infer data origin. Our approach introduces active training manipulation and data augmentation to expose client-specific patterns. Experimental results across five representative FL algorithms and multiple datasets show that our method significantly outperforms prior passive attacks. These findings reveal a deeper level of privacy vulnerability in FL and call for stronger defense mechanisms under active threat models.

PDF Details DOI

JBHI Journal 2024 Journal Article

6G-Enabled Anomaly Detection for Metaverse Healthcare Analytics in Internet of Things

Xiaotong Wu
Yihong Yang
Muhammad Bilal
Lianyong Qi
Xiaolong Xu

As an emerging concept, the metaverse incorporates a range of advanced technologies and offers a great opportunity to enhance the experiences of healthcare in clinical practice and human health. However, many cyber security issues often occur in the metaverse healthcare analytics such as DDoS attack, probe attack, and port scanning attack. Fortunately, 6G-enabled intrusion detection can detect anomalous activities with the help of an anomaly detection algorithm for metaverse healthcare analytics. Nevertheless, different from static data, data streams in metaverse healthcare have the intrinsic characteristics of infiniteness, correlation, and distribution change. Traditional static data anomaly detection algorithms do not consider these characteristics, which may result in low accuracy and efficiency. In this article, a D ata S tream A nomaly D etection (DS_AD) approach driven by 6G network is proposed for metaverse healthcare analytics, which incorporates a sliding window and model update into LSHiForest. DS_AD uses a change detection mechanism to optimize the model update. The core design utilizes hash functions to partition data spaces to find anomalies. To validate the feasibility of DS_AD, multiple groups of experiments are designed and executed on SMTP and HTTP datasets. Experimental results show that compared with baselines, our proposal performs favorably for data streams in terms of accuracy and efficiency.

IJCAI Conference 2024 Conference Paper

Attention Based Document-level Relation Extraction with None Class Ranking Loss

Xiaolong Xu
Chenbin Li
Haolong Xiang
Lianyong Qi
Xuyun Zhang
Wanchun Dou

Through document-level relation extraction (RE), the analysis of the global relation between entities in the text is feasible, and more comprehensive and accurate semantic information can be obtained. In document-level RE, the model needs to infer the implicit relations between two entities in different sentences. To obtain more semantic information, existing methods mainly focus on exploring entity representations. However, they ignore the correlations and indivisibility between relations, entities and contexts. Furthermore, current methods only independently estimate the cases of predefined relations, ignoring the case of "no relation'', which results in poor prediction. To address the above issues, we propose a document-level RE method based on attention mechanisms, which considers the case of "no relation''. Specifically, our approach leverages graph attention and multi-head attention networks to capture the correlations and indivisibility among relations, entities, and contexts, respectively. In addition, a novel multi-label loss function that promotes large margins in label confidence scores between each predefined class and the none class is employed to improve the prediction performance. Extensive experiments conducted on benchmarking datasets demonstrate that our proposed method outperforms the state-of-the-art baselines with higher accuracy.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Counterfactual User Sequence Synthesis Augmented with Continuous Time Dynamic Preference Modeling for Sequential POI Recommendation

Lianyong Qi
Yuwen Liu
Weiming Liu
Shichao Pei
Xiaolong Xu
Xuyun Zhang
Yingjie Wang
Wanchun Dou

With the proliferation of Location-based Social Networks (LBSNs), user check-in data at Points-of-Interest (POIs) has surged, offering rich insights into user preferences. However, sequential POI recommendation systems always face two pivotal challenges. A challenge lies in the difficulty of modeling time in a discrete space, which fails to accurately capture the dynamic nature of user preferences. Another challenge is the inherent sparsity and noise in continuous POI recommendation, which hinder the recommendation process. To address these challenges, we propose counterfactual user sequence synthesis with continuous time dynamic preference modeling (CussCtpm). CussCtpm innovatively combines Gated Recurrent Unit (GRU) with neural Ordinary Differential Equations (ODEs) to model user preferences in a continuous time framework. CussCtpm captures user preferences at both the POI-level and interest-level, identifying deterministic and non-deterministic preference concepts. Particularly at the interest-level, we employ GRU and neural ODEs to model users' dynamic preferences in continuous space, aiming to capture finer-grained shifts in user preferences over time. Furthermore, CussCtpm utilizes counterfactual data augmentation to generate counterfactual positive and negative user sequences. Our extensive experiments on two widely-used public datasets demonstrate that CussCtpm outperforms several advanced baseline models.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

DapperFL: Domain Adaptive Federated Learning with Model Fusion Pruning for Edge Devices

Yongzhe Jia
Xuyun Zhang
Hongsheng Hu
Kim-Kwang Raymond Choo
Lianyong Qi
Xiaolong Xu
Amin Beheshti
Wanchun Dou

Federated learning (FL) has emerged as a prominent machine learning paradigm in edge computing environments, enabling edge devices to collaboratively optimize a global model without sharing their private data. However, existing FL frameworks suffer from efficacy deterioration due to the system heterogeneity inherent in edge computing, especially in the presence of domain shifts across local data. In this paper, we propose a heterogeneous FL framework DapperFL, to enhance model performance across multiple domains. In DapperFL, we introduce a dedicated Model Fusion Pruning (MFP) module to produce personalized compact local models for clients to address the system heterogeneity challenges. The MFP module prunes local models with fused knowledge obtained from both local and remaining domains, ensuring robustness to domain shifts. Additionally, we design a Domain Adaptive Regularization (DAR) module to further improve the overall performance of DapperFL. The DAR module employs regularization generated by the pruned model, aiming to learn robust representations across domains. Furthermore, we introduce a specific aggregation algorithm for aggregating heterogeneous local models with tailored architectures and weights. We implement DapperFL on a real-world FL platform with heterogeneous clients. Experimental results on benchmark datasets with multiple domains demonstrate that DapperFL outperforms several state-of-the-art FL frameworks by up to 2. 28%, while significantly achieving model volume reductions ranging from 20% to 80%. Our code is available at: https: //github. com/jyzgh/DapperFL.

PDF Details DOI

EAAI Journal 2024 Journal Article

Efficient human activity recognition: A deep convolutional transformer-based contrastive self-supervised approach using wearable sensors

Yujie Sun
Xiaolong Xu
Xincheng Tian
Lelai Zhou
Yibin Li

TIST Journal 2024 Journal Article

Privacy-preserving Point-of-interest Recommendation based on Simplified Graph Convolutional Network for Geological Traveling

Yuwen Liu
Xiaokang Zhou
Huaizhen Kou
Yawu Zhao
Xiaolong Xu
Xuyun Zhang
Lianyong Qi

The provision of privacy-preserving recommendations for geological tourist attractions is an important research area. The historical check-in data collected from location-based social networks (LBSNs) can be utilized to mine their preferences, thereby facilitating the promotion of the geological tourism industry. However, such check-ins often contain sensitive user information that poses privacy leakage risks. To address this issue, some methods have been proposed to develop privacy-preserving point-of-interest (POI) recommendation systems. These methods commonly rely on either perturbation-based or federated learning techniques to protect users’ privacy. However, the former can hinder preference capture, while the latter remains vulnerable to privacy breaches during the parameter-sharing process. To overcome these challenges, we propose a novel privacy-preserving POI recommendation model that incorporates users’ privacy preferences based on a simplified graph convolutional neural network. Specifically, we employ a generative model to create a subset of POIs that reflect users’ preferences but do not reveal their private information, and then we design a simplified graph convolutional network to analyze the high-order connectivity between users and POIs that are privacy-preserving. The resulting model enables efficient POI recommendation under strict privacy protection, which is particularly relevant to geological tourism. Experimental results on two public datasets demonstrate the effectiveness of our proposed approach.

IJCAI Conference 2024 Conference Paper

Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought

Xiaoxiao Chi
Xuyun Zhang
Yan Wang
Lianyong Qi
Amin Beheshti
Xiaolong Xu
Kim-Kwang Raymond Choo
Shuo Wang

Recommender systems have been successfully applied in many applications. Nonetheless, recent studies demonstrate that recommender systems are vulnerable to membership inference attacks (MIAs), leading to the leakage of users’ membership privacy. However, existing MIAs relying on shadow training suffer a large performance drop when the attacker lacks knowledge of the training data distribution and the model architecture of the target recommender system. To better understand the privacy risks of recommender systems, we propose shadow-free MIAs that directly leverage a user’s recommendations for membership inference. Without shadow training, the proposed attack can conduct MIAs efficiently and effectively under a practice scenario where the attacker is given only black-box access to the target recommender system. The proposed attack leverages an intuition that the recommender system personalizes a user’s recommendations if his historical interactions are used by it. Thus, an attacker can infer membership privacy by determining whether the recommendations are more similar to the interactions or the general popular items. We conduct extensive experiments on benchmark datasets across various recommender systems. Remarkably, our attack achieves far better attack accuracy with low false positive rates than baselines while with a much lower computational cost.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Differentially Private Learning with Per-Sample Adaptive Clipping

Tianyu Xia
Shuheng Shen
Su Yao
Xinyi Fu
Ke Xu
Xiaolong Xu
Xing Fu

Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clipping, which requires carefully tuned due to its significant impact on model performance. As a solution to this issue, latest works NSGD and Auto-S innovatively propose to use normalization instead of clipping to avoid hyperparameter tuning. However, normalization-based approaches like NSGD and Auto-S rely on a monotonic weight function, which imposes excessive weight on small gradient samples and introduces extra deviation to the update. In this paper, we propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which guarantees privacy without the typical hyperparameter tuning process of using a constant clipping while significantly reducing the deviation between the update and true batch-averaged gradient. We provide a rigorous theoretical convergence analysis and show that with convergence rate at the same order, the proposed algorithm achieves a lower non-vanishing bound, which is maintained over training iterations, compared with NSGD/Auto-S. In addition, through extensive experimental evaluation, we show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

OptIForest: Optimal Isolation Forest for Anomaly Detection

Haolong Xiang
Xuyun Zhang
Hongsheng Hu
Lianyong Qi
Wanchun Dou
Mark Dras
Amin Beheshti
Xiaolong Xu

Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e. g. , iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

SAD: Semi-Supervised Anomaly Detection on Dynamic Graphs

Sheng Tian
Jihai Dong
Jintang Li
Wenlong Zhao
Xiaolong Xu
Baokun Wang
Bowen Song
Changhua Meng

Anomaly detection aims to distinguish abnormal instances that deviate significantly from the majority of benign ones. As instances that appear in the real world are naturally connected and can be represented with graphs, graph neural networks become increasingly popular in tackling the anomaly detection problem. Despite the promising results, research on anomaly detection has almost exclusively focused on static graphs while the mining of anomalous patterns from dynamic graphs is rarely studied but has significant application value. In addition, anomaly detection is typically tackled from semi-supervised perspectives due to the lack of sufficient labeled data. However, most proposed methods are limited to merely exploiting labeled data, leaving a large number of unlabeled samples unexplored. In this work, we present semi-supervised anomaly detection (SAD), an end-to-end framework for anomaly detection on dynamic graphs. By a combination of a time-equipped memory bank and a pseudo-label contrastive learning module, SAD is able to fully exploit the potential of large unlabeled samples and uncover underlying anomalies on evolving graph streams. Extensive experiments on four real-world datasets demonstrate that SAD efficiently discovers anomalies from dynamic graphs and outperforms existing advanced methods even when provided with only little labeled data.

PDF Details DOI

TIST Journal 2022 Journal Article

PSDF: Privacy-aware IoV Service Deployment with Federated Learning in Cloud-Edge Computing

Xiaolong Xu
Wentao Liu
Yulan Zhang
Xuyun Zhang
Wanchun Dou
Lianyong Qi
Md Zakirul Alam Bhuiyan

Through the collaboration of cloud and edge, cloud-edge computing allows the edge that approximates end-users undertakes those non-computationally intensive service processing of the cloud, reducing the communication overhead and satisfying the low latency requirement of Internet of Vehicle (IoV). With cloud-edge computing, the computing tasks in IoV is able to be delivered to the edge servers (ESs) instead of the cloud and rely on the deployed services of ESs for a series of processing. Due to the storage and computing resource limits of ESs, how to dynamically deploy partial services to the edge is still a puzzle. Moreover, the decision of service deployment often requires the transmission of local service requests from ESs to the cloud, which increases the risk of privacy leakage. In this article, a method for privacy-aware IoV service deployment with federated learning in cloud-edge computing, named PSDF, is proposed. Technically, federated learning secures the distributed training of deployment decision network on each ES by the exchange and aggregation of model weights, avoiding the original data transmission. Meanwhile, homomorphic encryption is adopted for the uploaded weights before the model aggregation on the cloud. Besides, a service deployment scheme based on deep deterministic policy gradient is proposed. Eventually, the performance of PSDF is evaluated by massive experiments.