EAAI Journal 2026 Journal Article
Model-based speech enhancement with spectral envelope correction using stacked autoencoders
- Wenhao Lu
- Zhenya Zang
- Feng Qin
- Xia Dong
- Jie Han
- Zuozhou Pan
- Yiping Ke
Speech enhancement aims to improve the intelligibility and perceptual quality of noisy speech signals. Many deep learning-based denoising approaches have been developed in the past decade. Their training objectives are to minimize the overall error between predicted and target signals with various mathematical metrics. However, enhancing the perceptual quality is more dependent on preserving the inherent speech characteristics than on overall signal matching. Neglecting this aspect may limit the improvements. Hence, we propose a speech enhancement system that combines the Harmonic Noise Model (HNM) with Stacked Autoencoder (SAE)-based spectral envelope correction. The HNM framework reconstructs the harmonic structure, which is a key spectral feature that contributes to timbre and perceived loudness. Since the parameters used for HNM reconstruction are corrupted by background noise, we design spectral envelope correction modules for restoration. These modules adopt a cluster-specific training strategy. Input data with similar characteristics are first grouped to guide the neural network in learning specific feature representations. Then, within each cluster, the associated SAE builds a robust mapping between clean and noisy parameters by mitigating redundancy and random perturbations in the data. Experimental results verify the effectiveness of our scheme across various noise types and input signal-to-noise levels.