JBHI Journal 2026 Journal Article
NoTAC: A Noise-Tolerance Automatic Cleaning Framework for Bone Marrow Karyotyping Data
- Rihan Huang
- Siyuan Chen
- Yafei Li
- Chunling Zhang
- Yilan Zhang
- Changchun Yang
- Na Li
- Jingdong Hu
Deep neural networks have advanced chromosome classification, a critical procedure in karyotyping for disease diagnosis. However, training an effective DNN requires clean and reliable data, whereas real-world clinical chromosome data often contain label errors and outliers, which degrade DNN performance and limit their clinical applicability. In this work, we propose a Noise-Tolerance Automatic Cleaning framework, named NoTAC, to address potential labeling errors and outliers to enhance the performance of chromosome classification. The framework consists of two branches: KaryoCleanse for label noise detection and KaryoDrift for outlier identification. First, it identifies potential label errors by leveraging the DNN’s self-confidence, estimating the latent label distribution, and ranking probabilities to prune mislabeled data. Second, it scores out-of-distribution samples based on the average K-nearest neighbor distances, enabling the identification and removal of outlier data. We conducted comprehensive comparative experiments against state-of-the-art noise-handling methods on a real-world R-band bone marrow chromosome dataset. Our results demonstrate that NoTAC achieves superior performance with an accuracy of 93. 99%, which represents a 6. 25% relative improvement over the baseline and outperforms the best competing method by 0. 92%. Furthermore, our qualitative analysis of NoTAC revealed reliable data issues in a real-world R-band bone marrow chromosome dataset, offering insights into how these issues impair DNN prediction capabilities. These findings demonstrate NoTAC’s potential to enhance both the performance and reliability of DNNs in practical medical datasets. The proposed method has also been applied to assist clinical karyotype diagnosis.