EAAI Journal 2026 Journal Article
An end-to-end wavelet-based irregular transformer with gumbel sampling for spatiotemporal welding prediction
- Changhui Liu
- Ke Jin
- Jianzhi Sun
- Lin Deng
- Jiewu Leng
- Xin Li
- Qian Li
- Qingcheng Yang
Welding prediction plays a vital role in ensuring assembly precision and minimizing rework in thin-walled structures. Data-driven approaches have attracted increasing attention; however, most existing methods rely on oversimplified input representations, overlook irregular temporal dynamics, and focus solely on single-variable prediction, limiting their applicability in complex welding scenarios. Motivated by these challenges, this study develops an end-to-end Wavelet Irregular Transformer with Gumbel Sampling, designed to achieve accurate spatiotemporal prediction of welding-induced deformation and residual stress. From the artificial intelligence perspective, the model incorporates a large language model-based embedding initializer that compresses and contextualizes step-level simulation parameters, and an adaptive parameterized Gumbel keyframe extractor that dynamically identifies the most informative temporal segments. This design enables efficient learning over ultra-long welding sequences while maintaining high-fidelity temporal representations. From the engineering application perspective, a channel-aware wavelet encoder–decoder is developed to fuse multi-frequency and multi-channel features, improving spatial coherence and capturing coupled stress–strain interactions. Validation on a dedicated thin-plate welding dataset, supplemented by physical experiments, shows that the proposed method achieves superior accuracy, robustness, and computational efficiency compared with optimized encoder–decoder and sequence-modeling baselines. The proposed Wavelet Irregular Transformer with Gumbel Sampling achieves a deformation mean absolute error of 0. 033 mm and a root mean square error of 0. 045 mm on the test set, reducing the deformation mean absolute error by 92. 0% and the root mean square error by 82. 1% compared with the best uniform-sampling baseline, while requiring 76. 7% fewer billion floating-point operations than a full-sequence Transformer.