Author name cluster

Xiao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

121 papers

2 author rows

EAAI Journal 2026 Journal Article

A self-explanatory deep learning-based soft sensor induced by a physical diffusion process and its application in an industrial process

Xiao Wang
Han Liu
Xiaomei Qi
Yong Zhang

Details DOI

AAAI Conference 2026 Conference Paper

Bid Farewell to Seesaw: Towards Accurate Long-Tail Session-Based Recommendation via Dual Constraints of Hybrid Intents

Xiao Wang
Ke Qin
Dongyang Zhang
Xiurui Xie
Shuang Liang

Session-based recommendation (SBR) aims to predict anonymous users' next interaction based on their interaction sessions. In practical recommendation scenario, low-exposure items constitute the majority of interactions, creating a long-tail distribution that severely compromises recommendation diversity. Existing approaches attempt to address this issue by promoting tail items but incur accuracy degradation, exhibiting a "see-saw" effect between long-tail and accuracy performance. We attribute such conflict to session-irrelevant noise within the tail item set, which existing long-tail approaches fail to identify and constrain effectively. To resolve our fundamental conflict, we propose HID (Hybrid Intent-based Dual Constraint Framework), a plug-and-play framework that transforms the conventional "see-saw" into a "win-win" relationship through introducing the hybrid intent-based dual constraints. Two key innovations are incorporated in this framework: (i) Hybrid Intent Learning, where we reformulate the intent extraction strategies by employing attribute-aware spectral clustering to reconstruct the item-to-intent mapping. Furthermore, discrimination of session-irrelevant noise is achieved through the assignment of both target and noise intents to each sessions. (ii) Intent Constraint Loss, where we propose two novel constraint paradigms regarding the diversity and accuracy to regulate the representation learning process, and unify the two optimization objectives into a unique loss. Extensive experiments across multiple SBR models and datasets demonstrate that HID can enhance both long-tail performance and recommendation accuracy, establishing new state-of-the-art performance in long-tail recommender systems.

PDF Details DOI

AAAI Conference 2026 Conference Paper

CAD-VAE: Leveraging Correlation-Aware Latents for Comprehensive Fair Disentanglement

Chenrui Ma
Xi Xiao
Tianyang Wang
Xiao Wang
Yanning Shen

While deep generative models have significantly advanced representation learning, they may inherit or amplify biases and fairness issues by encoding sensitive attributes alongside predictive features. Enforcing strict independence in disentanglement is often unrealistic when target and sensitive factors are naturally correlated. To address this challenge, we propose CAD-VAE(Correlation-Aware Disentangled VAE), which introduces a correlated latent code to capture the information shared between the target and sensitive attributes. Given this correlated latent, our method effectively separates overlapping factors without extra domain knowledge by directly minimizing the conditional mutual information between target and sensitive codes. A relevance-driven optimization strategy refines the correlated code by efficiently capturing essential correlated features and eliminating redundancy. Extensive experiments on benchmark datasets demonstrate that CAD-VAE produces fairer representations, realistic counterfactuals, and improved fairness-aware image editing.

PDF Details DOI

EAAI Journal 2026 Journal Article

Consistency and consensus-based decision-making for probabilistic linguistic information reliability

Yan Chen
Dan Li
Lin Liu
Xiao Wang
Lei Xu

Details DOI

AAAI Conference 2026 Conference Paper

DeepBooTS: Dual-Stream Residual Boosting for Drift-Resilient Time-Series Forecasting

Daojun Liang
Jing Chen
Xiao Wang
Yinglong Wang
Shuo Li

Time-Series (TS) exhibits pronounced non-stationarity. Consequently, most forecasting methods display compromised robustness to concept drift, despite the prevalent application of instance normalization. We tackle this challenge by first analysing concept drift through a bias-variance lens and proving that weighted ensemble reduces variance without increasing bias. These insights motivate DeepBooTS, a novel end-to-end dual-stream residual-decreasing boosting method that progressively reconstructs the intrinsic signal. In our design, each block of a deep model becomes an ensemble of learners with an auxiliary output branch forming a highway to the final prediction. The block‑wise outputs correct the residuals of previous blocks, leading to a learning‑driven decomposition of both inputs and targets. This method enhances versatility and interpretability while substantially improving robustness to concept drift. Extensive experiments, including those on large-scale datasets, show that the proposed method outperforms existing methods by a large margin, yielding an average performance improvement of 15.8% across various datasets, establishing a new benchmark for TS forecasting.

PDF Details DOI

JBHI Journal 2026 Journal Article

Few-Shot Personalized Blood Pressure Estimation from Photoplethysmography and Physiological Priors via Low-Rank Adaptation

Meitong Li
Jing Chen
Dawei Shi
Yuanting Zhang
Xiao Wang

Noninvasive continuous blood pressure (BP) monitoring has become a critical requirement for effective health management in the general population. To address the challenge of accurate few-shot personalized BP estimation, a photoplethysmography (PPG)-based framework built on the unified multi-task time series model with a Transformer backbone is proposed. The framework comprises population-level pretraining and personalized fine tuning with a pulse pressure segmented penalty (PPSP) loss. The PPSP couples systolic BP (SBP) and diastolic BP (DBP) outputs by penalizing pulse pressure values outside clinically accepted ranges, which enforces physiological consistency. In addition, a sampling-rate-robust low-rank adaptation (SRR-LoRA) is introduced to improve estimation accuracy when low-frequency PPG signals are employed. After rate alignment, SRR-LoRA prioritizes measurements over interpolated points, suppresses interpolation noise, and preserves cross-device generalization. Model performance was evaluated on the UCI cuffless BP estimation dataset, the University of Queensland vital signs dataset, and the CAS-BP dataset. 113, 812 samples from 2, 405 subjects were used for pretraining, and data from 316 subjects (each with 50 samples) were included for few-shot fine tuning. The proposed method achieved mean absolute errors of 1. 52/1. 07 mmHg for SBP/DBP. These results fulfill the Association for the Advancement of Medical Instrumentation BP standard and correspond to Grade A performance according to the British Hypertension Society standard and IEEE 1708 standard, which demonstrates the framework's potential for practical personalized wearable BP monitoring.

Details DOI

EAAI Journal 2026 Journal Article

Fusing acoustic emission and deep learning for automatic identification of progressive rock fracture

Tongxiaoyu Wang
Jiang Xiao
Xiao Wang
Sen Zhang
Xiaofei Li
Yujiang Liu
Wenkai Bai
Jianjun Wu

Details DOI

AAAI Conference 2026 Conference Paper

Invariant Conditional Molecular Generation Under Distribution Shift

Chunyu Hu
Tianyin Liao
Yicheng Sui
Ran Zhang
Xiao Wang
Ziwei Zhang

Conditional molecular generation, aiming to generate 2D and 3D molecules that satisfy given properties, has achieved remarkable progress, thanks to the advances in deep generative models such as graph diffusion. However, existing methods generally assume that the given conditions for training and testing are consistent, failing to handle the realistic challenge when there exist distribution shifts between training and testing conditions. Invariant learning is a mainstream paradigm for addressing distribution shifts, but fusing invariant learning principles with conditional molecular generation faces three core challenges: (1) existing invariant learning methods focus on discriminative tasks and cannot be directly adapted to molecule generative tasks; (2) how to distinguish between invariant subgraph and variant subgraph of a molecule graph, which is treated as an integrated input; (3) how to fuse invariant subgraphs, variant subgraphs, and property conditions for effective generation. To tackle these challenges, we propose Invariant Conditional MOLecular generation (IC-MOL), a framework that combines invariant learning with graph diffusion to improve the generalization ability of conditional molecular generation under distribution shifts. Specifically, we first disentangle molecular graphs into invariant and variant subgraphs while maintaining SE(3) equivariance, an important inductive bias for molecular generation. On this basis, we further design a two-phase graph diffusion generation model. In the first phase, we generate an invariant molecular consistent with the target property. In the second phase, we propose a cross-attention mechanism to fuse variant subgraph representations and property conditions to guide the generation of complete molecules while maintaining property alignment. Extensive experiments on the benchmark dataset show that IC-MOL consistently outperforms state-of-the-art baselines across six property conditions under distribution shifts.

PDF Details DOI

JBHI Journal 2026 Journal Article

mRSubLoc: A Novel Multi-Label Learning Framework Integrating RNA Large Language Model for mRNA Subcellular Localization

Xiao Wang
Lixiang Yang
Rong Wang
Yongfeng Zhang

The subcellular localization of messenger RNA (mRNA) is essential for the regulation of gene expression and plays a pivotal role in targeted drug development. Although several computational models have been developed to predict mRNA localization, these approaches still face challenges in sequence representation and exhibit limited performance in handling multi-localization tasks. In this paper, we propose mRSubLoc, a novel multi-label deep learning framework for predicting mRNA subcellular localization. The model integrates the RNA large language model RNAErnie with one-hot encoding and Word2Vec embeddings to construct a comprehensive representation of mRNA sequences. A text convolutional neural network (TextCNN) is employed to capture local feature patterns, while a bidirectional long short-term memory network (BiLSTM) is used to capture long-range dependencies. These features are fused using a multi-head self-attention mechanism to effectively capture localization-specific characteristics. Finally, a multi-layer perceptron (MLP) explores complex dependencies among multiple localization sites, facilitating accurate mRNA subcellular localization prediction. Experimental results on a testing set demonstrate that mRSubLoc significantly outperforms state-of-the-art methods across multiple metrics, including Aiming (0. 7858), Coverage (0. 6212), Accuracy (0. 6161), Absolute-True (0. 3070), and Absolute-False (0. 1319). This study proposes a novel approach for predicting mRNA subcellular localization and provides new perspectives for advancing disease diagnosis and drug discovery in biomedical research.

Details DOI

AAAI Conference 2026 Conference Paper

MSTDiff: Multiscale-Aware Transformer Diffusion Network for Video Object Detection

Qiang Qi
Wenqi Shang
Xiao Wang
Yanjie Liang
Shuyuan Lin

Video object detection is a fundamental yet challenging task in computer vision. Recently, DETR-based methods have gained prominence in this domain owing to their powerful global modeling capabilities. However, these methods are still confronted with two key limitations: frame-agnostic initialization of object queries and scale-agnostic attention mechanisms, which hinder their capability to capture the appearance variations of dynamic objects and model the temporal consistency across frames. To alleviate these limitations, we propose a multiscale-aware transformer diffusion network (MSTDiff), a novel framework designed for the video object detection task, including two technical improvements over existing methods. First, we design a diffusion-driven adaptive query module, which models the object query distribution through a diffusion process conditioned on input frames, enabling an adaptive and content-aware initialization of object queries. Second, we develop a multiscale-aware transformer encoder module, which combines multi-head convolutional units with attention mechanisms to enhance multi-scale feature representations while preserving global dependence modeling. We conduct extensive experiments on the public ImageNet VID dataset, and the results demonstrate that our MSTDiff achieves 87.7% mAP with ResNet-101, outperforming most previous state-of-the-art video object detection methods.

PDF Details DOI

TMLR Journal 2026 Journal Article

Prompt-based Adaptation in Large-scale Vision Models: A Survey

Xi Xiao
Yunbei Zhang
Lin Zhao
Yiyang Liu
Xiaoying Liao
Zheda Mai
Xingjian Li
Xiao Wang

In computer vision, Visual Prompting (VP) and Visual Prompt Tuning (VPT) have recently emerged as lightweight and effective alternatives to full fine-tuning for adapting large-scale vision models within the ``pretrain-then-finetune'' paradigm. However, despite rapid progress, their conceptual boundaries remain blurred, as VP and VPT are frequently used interchangeably in current research, reflecting a lack of systematic distinction between these techniques and their respective applications. In this survey, we revisit the designs of VP and VPT from first principles, and conceptualize them within a unified framework termed Prompt-based Adaptation (PA). Within this framework, we distinguish methods based on their injection granularity: VP operates at the pixel level, while VPT injects prompts at the token level. We further categorize these methods by their generation mechanism into fixed, learnable, and generated prompts. Beyond the core methodologies, we examine PA’s integrations across diverse domains, including medical imaging, 3D point clouds, and vision-language tasks, as well as its role in test-time adaptation and trustworthy AI. We also summarize current benchmarks and identify key challenges and future directions. To the best of our knowledge, we are the first comprehensive survey dedicated to PA's methodologies and applications in light of their distinct characteristics. Our survey aims to provide a clear roadmap for researchers and practitioners in all area to understand and explore the evolving landscape of PA-related research.