JBHI Journal 2026 Journal Article
ECG–MTDA: A Framework for Morphology–Temporal Decoupling and LLM–Guided Text Alignment
- Huimin Zheng
- Jun Guo
- Xin Zhang
- Xiaofen Xing
- Xiangmin Xu
Effective automated electrocardiogram (ECG) interpretation hinges on disentangling waveform morphology from rhythm dynamics, a challenge for existing multimodal models that often conflate these heterogeneous attributes and introduce semantic ambiguity. We introduce ECG-MTDA, a framework that explicitly decouples these components. It learns morphology-oriented representations via a PQRST-guided masked autoencoder, while separately modeling temporal dynamics using continuous wavelet transform. Crucially, we align the learned morphology with concise, label-conditioned textual descriptions generated by a large language model (LLM) using a contrastive objective, creating a semantically grounded embedding space. ECG-MTDA demonstrates superior performance on the PTB-XL and CPSC 2018 benchmarks (e. g. , AUC 93. 16 on PTB-XL Superclass), with statistically significant gains over a strong multimodal baseline. Furthermore, on a challenging in-house cohort (n=620) for short-term paroxysmal atrial fibrillation (pAF) progression prediction, the model achieves an AUC of 0. 97 $\pm$ 0. 02 with high sensitivity (0. 80 $\pm$ 0. 04) and specificity (0. 98 $\pm$ 0. 01). Ablation studies and qualitative analyses confirm the benefits of our decoupled design and morphology-text alignment. Our results demonstrate that this clinically-inspired decoupling strategy yields more precise and robust multimodal representations for complex ECG analysis, enhancing both diagnostic classification and near-term risk stratification.