Author name cluster

Chengqing Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

AAAI Conference 2026 Conference Paper

APT: Affine Prototype-Timestamp for Time Series Forecasting Under Distribution Shift

Yujie Li
Zezhi Shao
Chengqing Yu
Yisong Fu
Tao Sun
Yongjun Xu
Fei Wang

Time series forecasting under distribution shift remains challenging, as existing deep learning models often rely on local statistical normalization (e.g., mean and variance) that fails to capture global distribution shift. Methods like RevIN and its variants attempt to decouple distribution and pattern but still struggle with missing values, noisy observations, and invalid channel-wise affine transformation. To address these limitations, we propose Affine Prototype-Timestamp(APT), a lightweight and flexible plug-in module that injects global distribution features into the normalization–forecasting pipeline. By leveraging timestamp-conditioned prototype learning, APT dynamically generates affine parameters that modulate both input and output series, enabling the backbone to learn from self-supervised, distribution-aware clustered instances. APT is compatible with arbitrary forecasting backbones and normalization strategies while introducing minimal computational overhead. Extensive experiments across six benchmark datasets and multiple backbone-normalization combinations demonstrate that APT significantly improves forecasting performance under distribution shift.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

Chuanguang Yang
XinQiang Yu
Han Yang
Zhulin An
Chengqing Yu
Libo Huang
Yongjun Xu

Multi-teacher Knowledge Distillation (KD) transfers diverse knowledge from a teacher pool to a student network. The core problem of multi-teacher KD is how to balance distillation strengths among various teachers. Most existing methods often develop weighting strategies from an individual perspective of teacher performance or teacher-student gaps, lacking comprehensive information for guidance. This paper proposes Multi-Teacher Knowledge Distillation with Reinforcement Learning (MTKD-RL) to optimize multi-teacher weights. In this framework, we construct both teacher performance and teacher-student gaps as state information to an agent. The agent outputs the teacher weight and can be updated by the return reward from the student. MTKD-RL reinforces the interaction between the student and teacher using an agent in an RL-based decision mechanism, achieving better matching capability with more meaningful weights. Experimental results on visual recognition tasks, including image classification, object detection, and semantic segmentation tasks, demonstrate that MTKD-RL achieves state-of-the-art performance compared to the existing multi-teacher KD works.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

On the Integration of Spatial-Temporal Knowledge: A Lightweight Approach to Atmospheric Time Series Forecasting

Yisong Fu
Fei Wang
Zezhi Shao
Boyu Diao
Lin Wu
Zhulin An
Chengqing Yu
Yujie Li

Transformers have gained attention in atmospheric time series forecasting (ATSF) for their ability to capture global spatial-temporal correlations. However, their complex architectures lead to excessive parameter counts and extended training times, limiting their scalability to large-scale forecasting. In this paper, we revisit ATSF from a theoretical perspective of atmospheric dynamics and uncover a key insight: spatial-temporal position embedding (STPE) can inherently model spatial-temporal correlations even without attention mechanisms. Its effectiveness arises from integrating geographical coordinates and temporal features, which are intrinsically linked to atmospheric dynamics. Based on this, we propose STELLA, a S patial- T emporal knowledge E mbedded L ightweight mode L for ASTF, utilizing only STPE and an MLP architecture in place of Transformer layers. With 10k parameters and one hour of training, STELLA achieves superior performance on five datasets compared to other advanced methods. The paper emphasizes the effectiveness of spatial-temporal knowledge integration over complex architectures, providing novel insights for ATSF.

PDF Details

NeurIPS Conference 2025 Conference Paper

Selective Learning for Deep Time Series Forecasting

Yisong Fu
Zezhi Shao
Chengqing Yu
Yujie Li
Zhulin An
Qi Wang
Yongjun Xu
Fei Wang

Benefiting from high capacity for capturing complex temporal patterns, deep learning (DL) has significantly advanced time series forecasting (TSF). However, deep models tend to suffer from severe overfitting due to the inherent vulnerability of time series to noise and anomalies. The prevailing DL paradigm uniformly optimizes all timesteps through the MSE loss and learns those uncertain and anomalous timesteps without difference, ultimately resulting in overfitting. To address this, we propose a novel selective learning strategy for deep TSF. Specifically, selective learning screens a subset of the whole timesteps to calculate the MSE loss in optimization, guiding the model to focus on generalizable timesteps while disregarding non-generalizable ones. Our framework introduces a dual-mask mechanism to target timesteps: (1) an uncertainty mask leveraging residual entropy to filter uncertain timesteps, and (2) an anomaly mask employing residual lower bound estimation to exclude anomalous timesteps. Extensive experiments across eight real-world datasets demonstrate that selective learning can significantly improve the predictive performance for typical state-of-the-art deep models, including 37. 4% MSE reduction for Informer, 8. 4% for TimesNet, and 6. 5% for iTransformer.

PDF Details

EAAI Journal 2024 Journal Article

Semi-supervised anomaly detection with contamination-resilience and incremental training

Liheng Yuan
Fanghua Ye
Heng Li
Chenhao Zhang
Cuiying Gao
Chengqing Yu
Wei Yuan
Xinge You

Anomaly detection plays a vital role in various realistic applications, including fraud detection, network traffic analysis, medical diagnosis, and so on. Semi-supervised anomaly detection methods have recently attracted increasing attention, owing to their low requirement for labeled anomalous samples. However, existing semi-supervised methods suffer from performance degradation when training data are contaminated with anomalies, and cannot well support incremental training required in scenarios where original training data are hard to obtain. To overcome these limitations, we propose SAE-CRIT, a lightweight semi-supervised anomaly detection method with contamination resilience and incremental training. SAE-CRIT effectively mitigates the negative impact of contaminated data through differentially weighting samples, and leverages a three-layer neural network to detect anomalies, allowing for efficient incremental training by updating only the last layer with new data. We compare SAE-CRIT with eight anomaly detection methods over four datasets. Extensive experiments demonstrate the advantages of SAE-CRIT in contamination resistance, incremental training, and training costs. More specifically, the state-of-the-art detection method GOAD achieved an F1-score of 89. 3% and 90. 6% on the contaminated datasets KDDCUP and KDDCUP-Rev, respectively. Under the same settings, however, SAE-CRIT exhibited an F1-score of 92. 4% and 96. 9%, respectively. In addition, the training time of SAE-CRIT is less than 20 s on these two datasets. The time spent by SAE-CRIT on these two datasets only accounts for 0. 26% and 1. 8% of the total time spent by GOAD, respectively.

Details DOI

EAAI Journal 2024 Journal Article

WGformer: A Weibull-Gaussian Informer based model for wind speed prediction

Ziyi Shi
Jia Li
Zheyuan Jiang
Huang Li
Chengqing Yu
Xiwei Mi

Accurate wind speed forecasting can improve energy management efficiency and promote the use of renewable energy. However, the inherent nonlinearity and fluctuation of wind speed make prediction challenging. To address these issues, we design an efficient Informer-based model, with improved calculation speed, forecasting accuracy and generalization ability. The proposed model in this paper reasonably integrates the Weibull-Gaussian transform, Informer and kernel mean square error loss and addresses the combination of various components. The Weibull-Gaussian transform is used as the data preprocessing module, which can remove non-Gaussian characteristics from the original data, and thus achieve noise reduction. The Informer is used as the main predictor, which can efficiently output accurate forecasting results based on an encoder-decoder architecture and self-attention mechanism. The kernel mean square error loss function, which shows strong robustness to outliers, is used to evaluate the nonlinearity of errors in reproducing kernel Hilbert space. To evaluate the performance of the proposed model, it is compared with several widely used models and state-of-the-art models. The experimental results indicate that the proposed model weakens the effect of outliers, yields high forecasting accuracy with mean square error = 0. 35, and outperforms the baselines up to 8. 5% on three datasets.

Details DOI