Arrow Research search

Author name cluster

Chengqing Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
1 author row

Possible papers

6

AAAI Conference 2026 Conference Paper

APT: Affine Prototype-Timestamp for Time Series Forecasting Under Distribution Shift

  • Yujie Li
  • Zezhi Shao
  • Chengqing Yu
  • Yisong Fu
  • Tao Sun
  • Yongjun Xu
  • Fei Wang

Time series forecasting under distribution shift remains challenging, as existing deep learning models often rely on local statistical normalization (e.g., mean and variance) that fails to capture global distribution shift. Methods like RevIN and its variants attempt to decouple distribution and pattern but still struggle with missing values, noisy observations, and invalid channel-wise affine transformation. To address these limitations, we propose Affine Prototype-Timestamp(APT), a lightweight and flexible plug-in module that injects global distribution features into the normalization–forecasting pipeline. By leveraging timestamp-conditioned prototype learning, APT dynamically generates affine parameters that modulate both input and output series, enabling the backbone to learn from self-supervised, distribution-aware clustered instances. APT is compatible with arbitrary forecasting backbones and normalization strategies while introducing minimal computational overhead. Extensive experiments across six benchmark datasets and multiple backbone-normalization combinations demonstrate that APT significantly improves forecasting performance under distribution shift.

AAAI Conference 2025 Conference Paper

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

  • Chuanguang Yang
  • XinQiang Yu
  • Han Yang
  • Zhulin An
  • Chengqing Yu
  • Libo Huang
  • Yongjun Xu

Multi-teacher Knowledge Distillation (KD) transfers diverse knowledge from a teacher pool to a student network. The core problem of multi-teacher KD is how to balance distillation strengths among various teachers. Most existing methods often develop weighting strategies from an individual perspective of teacher performance or teacher-student gaps, lacking comprehensive information for guidance. This paper proposes Multi-Teacher Knowledge Distillation with Reinforcement Learning (MTKD-RL) to optimize multi-teacher weights. In this framework, we construct both teacher performance and teacher-student gaps as state information to an agent. The agent outputs the teacher weight and can be updated by the return reward from the student. MTKD-RL reinforces the interaction between the student and teacher using an agent in an RL-based decision mechanism, achieving better matching capability with more meaningful weights. Experimental results on visual recognition tasks, including image classification, object detection, and semantic segmentation tasks, demonstrate that MTKD-RL achieves state-of-the-art performance compared to the existing multi-teacher KD works.

NeurIPS Conference 2025 Conference Paper

On the Integration of Spatial-Temporal Knowledge: A Lightweight Approach to Atmospheric Time Series Forecasting

  • Yisong Fu
  • Fei Wang
  • Zezhi Shao
  • Boyu Diao
  • Lin Wu
  • Zhulin An
  • Chengqing Yu
  • Yujie Li

Transformers have gained attention in atmospheric time series forecasting (ATSF) for their ability to capture global spatial-temporal correlations. However, their complex architectures lead to excessive parameter counts and extended training times, limiting their scalability to large-scale forecasting. In this paper, we revisit ATSF from a theoretical perspective of atmospheric dynamics and uncover a key insight: spatial-temporal position embedding (STPE) can inherently model spatial-temporal correlations even without attention mechanisms. Its effectiveness arises from integrating geographical coordinates and temporal features, which are intrinsically linked to atmospheric dynamics. Based on this, we propose STELLA, a S patial- T emporal knowledge E mbedded L ightweight mode L for ASTF, utilizing only STPE and an MLP architecture in place of Transformer layers. With 10k parameters and one hour of training, STELLA achieves superior performance on five datasets compared to other advanced methods. The paper emphasizes the effectiveness of spatial-temporal knowledge integration over complex architectures, providing novel insights for ATSF.

NeurIPS Conference 2025 Conference Paper

Selective Learning for Deep Time Series Forecasting

  • Yisong Fu
  • Zezhi Shao
  • Chengqing Yu
  • Yujie Li
  • Zhulin An
  • Qi Wang
  • Yongjun Xu
  • Fei Wang

Benefiting from high capacity for capturing complex temporal patterns, deep learning (DL) has significantly advanced time series forecasting (TSF). However, deep models tend to suffer from severe overfitting due to the inherent vulnerability of time series to noise and anomalies. The prevailing DL paradigm uniformly optimizes all timesteps through the MSE loss and learns those uncertain and anomalous timesteps without difference, ultimately resulting in overfitting. To address this, we propose a novel selective learning strategy for deep TSF. Specifically, selective learning screens a subset of the whole timesteps to calculate the MSE loss in optimization, guiding the model to focus on generalizable timesteps while disregarding non-generalizable ones. Our framework introduces a dual-mask mechanism to target timesteps: (1) an uncertainty mask leveraging residual entropy to filter uncertain timesteps, and (2) an anomaly mask employing residual lower bound estimation to exclude anomalous timesteps. Extensive experiments across eight real-world datasets demonstrate that selective learning can significantly improve the predictive performance for typical state-of-the-art deep models, including 37. 4% MSE reduction for Informer, 8. 4% for TimesNet, and 6. 5% for iTransformer.

EAAI Journal 2024 Journal Article

Semi-supervised anomaly detection with contamination-resilience and incremental training

  • Liheng Yuan
  • Fanghua Ye
  • Heng Li
  • Chenhao Zhang
  • Cuiying Gao
  • Chengqing Yu
  • Wei Yuan
  • Xinge You

Anomaly detection plays a vital role in various realistic applications, including fraud detection, network traffic analysis, medical diagnosis, and so on. Semi-supervised anomaly detection methods have recently attracted increasing attention, owing to their low requirement for labeled anomalous samples. However, existing semi-supervised methods suffer from performance degradation when training data are contaminated with anomalies, and cannot well support incremental training required in scenarios where original training data are hard to obtain. To overcome these limitations, we propose SAE-CRIT, a lightweight semi-supervised anomaly detection method with contamination resilience and incremental training. SAE-CRIT effectively mitigates the negative impact of contaminated data through differentially weighting samples, and leverages a three-layer neural network to detect anomalies, allowing for efficient incremental training by updating only the last layer with new data. We compare SAE-CRIT with eight anomaly detection methods over four datasets. Extensive experiments demonstrate the advantages of SAE-CRIT in contamination resistance, incremental training, and training costs. More specifically, the state-of-the-art detection method GOAD achieved an F1-score of 89. 3% and 90. 6% on the contaminated datasets KDDCUP and KDDCUP-Rev, respectively. Under the same settings, however, SAE-CRIT exhibited an F1-score of 92. 4% and 96. 9%, respectively. In addition, the training time of SAE-CRIT is less than 20 s on these two datasets. The time spent by SAE-CRIT on these two datasets only accounts for 0. 26% and 1. 8% of the total time spent by GOAD, respectively.

EAAI Journal 2024 Journal Article

WGformer: A Weibull-Gaussian Informer based model for wind speed prediction

  • Ziyi Shi
  • Jia Li
  • Zheyuan Jiang
  • Huang Li
  • Chengqing Yu
  • Xiwei Mi

Accurate wind speed forecasting can improve energy management efficiency and promote the use of renewable energy. However, the inherent nonlinearity and fluctuation of wind speed make prediction challenging. To address these issues, we design an efficient Informer-based model, with improved calculation speed, forecasting accuracy and generalization ability. The proposed model in this paper reasonably integrates the Weibull-Gaussian transform, Informer and kernel mean square error loss and addresses the combination of various components. The Weibull-Gaussian transform is used as the data preprocessing module, which can remove non-Gaussian characteristics from the original data, and thus achieve noise reduction. The Informer is used as the main predictor, which can efficiently output accurate forecasting results based on an encoder-decoder architecture and self-attention mechanism. The kernel mean square error loss function, which shows strong robustness to outliers, is used to evaluate the nonlinearity of errors in reproducing kernel Hilbert space. To evaluate the performance of the proposed model, it is compared with several widely used models and state-of-the-art models. The experimental results indicate that the proposed model weakens the effect of outliers, yields high forecasting accuracy with mean square error = 0. 35, and outperforms the baselines up to 8. 5% on three datasets.