JBHI Journal 2026 Journal Article
- Jun Yang
- Chen Zhu
- Renbiao Wu
Non-contact heart rate detection technology leverages changes in skin color to estimate heart rate, enhancing the convenience of health monitoring, particularly in situations requiring real-time, contact-free observation. However, current video-based methods face various limitations, including restricted feature extraction capabilities, redundant spatial information, and ineffective motion artifact processing. To address these problems, a novel end-to-end heart rate estimation network, Spatial-Temporal-Channel Network (STCNet), is proposed. Firstly, in order to solve the problem of redundant spatial information in current video-based heart rate estimation methods, a spatial attention learning (SAL) unit is designed to highlight the effective information of the facial region. Next, an improved temporal shift module (TSMP) with long-range temporal information perception is proposed. On this basis, A temporal-channel learning (TCL) unit is designed to achieve the interaction of information across different frames’ channels, aiming to address the insufficient capability of existing models in extracting periodic features of heartbeat. Finally, combining the SAL and TCL units, a feature extraction block (FEB) is designed. A feature extraction network is constructed by stacking multiple layers of FEBs to achieve accurate heart rate estimation. Numerous experiments are conducted on the UBFC-rPPG dataset and the PURE dataset to verify the effectiveness and generalization ability of our model. Notably, compared to the state-of-the-art CIN-rPPG, our model achieves a 0. 27 bpm reduction in mean absolute error (MAE) and a 0. 19 bpm reduction in root mean square error (RMSE), in intra-dataset testing on the PURE dataset. Experimental results demonstrate that our proposed model outperforms other mainstream models.