Arrow Research search

Author name cluster

Bin Pu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

JBHI Journal 2026 Journal Article

CMIS: A Class-Aware Multi-Structure Instance Segmentation Model for Fetal Brain Ultrasound Images With Fuzzy Region-Based Constraints

  • Hang Wang
  • Mingxing Duan
  • Yuhuan Lu
  • Bin Pu
  • Yue Qin
  • Shuihua Wang
  • Kenli Li

Fetal anatomical structure segmentation in ultrasound images is essential for biometric measurement and disease diagnosis. However, current methods focus on a specific plane or a few structures, whereas obstetricians diagnose by considering multiple structures from different planes. In addition, existing methods struggle with segmenting fuzzy regions, which leads to performance degradation. We propose a real-time segmentation method called Class-aware Multi-structure Instance Segmentation (CMIS), designed to segment 19 key structures in 3 fetal brain planes to support brain-disease diagnosis. We extract instance information and generate class-aware attention for each class instead of dense instances to save computing resources and provide more informative details. Then we implement cross-layer and multi-scale fusion to obtain detailed prototypes. Finally, we fuse global attention with local prototypes cropped by boxes to generate masks and randomly perturb the boxes during training to enhance robustness. Moreover, we propose a new fuzzy region-based constraint loss to address the challenge of structures with varying scales and fuzzy boundaries. Extensive experiments on a fetal brain dataset demonstrate that CMIS outperforms 13 competing baselines, with an mDice of 83. 41 $\pm$ 0. 03% at 37 FPS. CMIS also excels in external experiments on a fetal heart ultrasound dataset, achieving a mDice of 85. 73 $\pm$ 0. 02%. These results demonstrate the effectiveness of CMIS in segmenting complex anatomical structures in ultrasound and its potential for real-time clinical applications. CMIS is limited to 2D normal standard planes ( $\geq$ 19 weeks). Thus, its generalization to abnormal cases and broader datasets remains to be investigated.

AAAI Conference 2026 Conference Paper

MPA: Multimodal Prototype Augmentation for Few-Shot Learning

  • Liwen Wu
  • Wei Wang
  • Lei Zhao
  • Zhan Gao
  • Qika Lin
  • Shaowen Yao
  • Zuozhu Liu
  • Bin Pu

Recently, Few-shot Learning (FSL) has become a popular task that aims to recognize new classes from only a few labeled examples and has been widely applied in fields such as natural science, remote sensing, and medical images. However, most existing methods focus only on the visual modality and compute prototypes directly from raw support images, which lack comprehensive and rich multimodal information. To address these limitations, we propose a novel Multimodal Prototype Augmentation FSL framework called MPA, including LLM-based Multi-Variant Semantic Enhancement (LMSE), Hierarchical Multi-View Augmentation (HMA), and an Adaptive Uncertain Class Absorber (AUCA). LMSE leverages large language models to generate diverse paraphrased category descriptions, enriching the support set with additional semantic cues. HMA exploits both natural and multi-view augmentations to enhance feature diversity (e.g., changes in viewing distance, camera angles, and lighting conditions). AUCA models uncertainty by introducing uncertain classes via interpolation and Gaussian sampling, effectively absorbing uncertain samples. Extensive experiments on four single-domain and six cross-domain FSL benchmarks demonstrate that MPA achieves superior performance compared to existing state-of-the-art methods across most settings. Notably, MPA surpasses the second-best method by 12.29% and 24.56% in the single-domain and cross-domain setting, respectively, in the 5-way 1-shot setting.

AAAI Conference 2026 Conference Paper

Organ-Aware Routing Mixture-of-Retrieval Augmented Generation for Fetal Ultrasound Reporting

  • Bin Pu
  • Siyu Wang
  • Rongbin Li
  • Xinpeng Ding
  • Lei Zhao
  • Chaoqi Chen
  • Shengli Li
  • Kenli Li

Fetal ultrasound screening is a uniquely complex diagnostic task involving the simultaneous assessment of multiple fetal organs—each with its own anatomical and clinical context—within a single examination. Automating report generation for such cases poses a significant challenge: unlike existing methods that focus on single-organ radiology tasks (e.g., chest X-rays), fetal ultrasound requires reasoning over a structured, multiple-to-multiple setting, i.e., multi-organ images corresponding to a multi-section report. In this paper, we introduce FetusR, the first large-scale dataset for multi-organ fetal ultrasound reporting, containing 15,594 real-world cases with rich organ-wise annotations. To address the intrinsic image-report alignment, we propose Organ-Aware Routing Mixture-of-Retrieval Augmented Generation (ORM-RAG) inspired by the Mixture-of-Experts paradigm. Our method decomposes the complex alignment problem into multiple one-to-one sub-retrieval tasks. Specifically, ORM-RAG integrates (1) an organ-aware mixture-of-retrieval module that partitions the retrieval space into organ-specific corpora for independent retrieval, and (2) a dynamic routing mechanism that selectively aggregates high-confidence organ-specific reports while filtering uncertain ones. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines across both textual similarity and clinical accuracy metrics. Our work opens a new direction for long-form, structured report generation in real-world, multi-organ medical imaging scenarios.

AAAI Conference 2026 Conference Paper

Topology-Inspired Backward-Free Framework for Test-Time Adaptation in Medical Detection

  • Bin Pu
  • Xingguo Lv
  • Jiewen Yang
  • Kai Xu
  • Lei Zhao
  • Zuozhu Liu
  • Kenli Li

Recently, Test-Time Adaptation (TTA) has gained increasing attention in medical imaging due to its ability to improve model generalization under domain shifts without retraining. In particular, directly applying a well-trained model across various medical centers faces significant performance degradation caused by variations in equipment, operators, imaging conditions, and scanning skill levels of sonographers. Existing TTA methods either rely on parameter adaptation that increases computational cost or apply simple prediction fusion that ignores anatomical structure knowledge. To address these limitations, we propose a novel backward-free Topology-aware TTA framework named T^3 that integrates Structural Perception Modeling (SPM) and Box Regression Adaptation (BRA). SPM is implemented through an organ space heatmap generated via Gaussian kernel superposition. This heatmap encodes anatomical topology without requiring additional training or source data. BRA further improves localization and classification by fusing detection outputs based on the contribution of detected results to anatomically meaningful peak points from the heatmaps. Extensive experiments were conducted across six cross-domain scenarios, and the results demonstrate that our method achieves state-of-the-art cross-domain detection performance while maintaining high efficiency, offering a practical and robust solution for real-world medical diagnostic applications.

AAAI Conference 2026 Conference Paper

Unified Mixture-of-Experts Framework for Joint Cardiac and Vascular Ultrasound Analysis and Report Generation

  • Bin Pu
  • Jiewen Yang
  • Xingguo Lv
  • Kai Xu
  • Kenli Li

Echocardiography and vascular ultrasound are essential for comprehensive cardiovascular assessment, yet manual evaluation and writing reports are labor-intensive, time-consuming, and require expertise from both cardiology and vascular surgery departments. Current automated report generation systems mainly focus on X-ray or CT, often neglecting echocardiographic modalities and critical quantitative parameters like aortic diameter and main pulmonary artery diameter, limiting their clinical utility. Moreover, the interdependence between cardiac and peripheral vascular health necessitates cross-departmental insights, which existing methods fail to incorporate. To address these limitations, we first propose the vision-language framework named the Echo-Cardiac-Vascular (ECV), for joint cardiac and vascular ultrasound report generation and parameter measurements. ECV introduces a Mixture-of-Experts vision encoder tailored for distinct ultrasound subtypes, a structured parameter measurement module for accurate quantification, and task-specific decoders that generate interpretable, multimodal diagnostic reports. Our framework, trained on 10K+ paired records, achieves high accuracy, improving diagnostic efficiency, consistency, and cross-disciplinary clinical applicability.

JBHI Journal 2026 Journal Article

ÆMMamba: An Efficient Medical Segmentation Model With Edge Enhancement

  • Xingbo Dong
  • Bowen Zhou
  • Chen Yin
  • Iman Yi Liao
  • Zhe Jin
  • Zhaozhao Xu
  • Bin Pu

Medical image segmentation is critical for disease diagnosis, treatment planning, and prognosis assessment, yet the complexity and diversity of medical images pose significant challenges to accurate segmentation. While Convolutional Neural Networks capture local features and Vision Transformers excel in the global context, both struggle with efficient long-range dependency modeling. Inspired by Mamba's State Space Modeling efficiency, we propose ÆMMamba, a novel multi-scale feature extraction framework built on the Mamba backbone network. ÆMMamba integrates several innovative modules: the Efficient Fusion Bridge (EFB) module, which employs a bidirectional state-space model and attention mechanisms to fuse multi-scale features; the Edge-Aware Module (EAM), which enhances low-level edge representation using Sobel-based edge extraction; and the Boundary Sensitive Decoder (BSD), which leverages inverse attention and residual convolutional layers to handle cross-level complex boundaries. ÆMMamba achieves state-of-the-art performance across 8 medical segmentation datasets. On polyp segmentation datasets (Kvasir, ClinicDB, ColonDB, EndoScene, ETIS), it records the highest mDice and mIoU scores, outperforming methods like MADGNet and Swin-UMamba, with a standout mDice of 72. 22 on ETIS, the most challenging dataset in this domain. For lung and breast segmentation, ÆMMamba surpasses competitors such as H2Former and SwinUnet, achieving Dice scores of 84. 24 on BUSI and 79. 83 on COVID-19 Lung. And on the LGG brain MRI dataset, ÆMMamba attains an mDice of 87. 25 and an mIoU of 79. 31, outperforming all compared methods.

AAAI Conference 2025 Conference Paper

Anatomical Knowledge Mining and Matching for Semi-supervised Medical Multi-structure Detection

  • Bin Pu
  • Liwen Wang
  • Jiewen Yang
  • Xingbo Dong
  • Benteng Ma
  • Zhuangzhuang Chen
  • Lei Zhao
  • Shengli Li

In medical image analysis, detecting multiple structures is crucial for evaluations and diagnosis but is often limited by the lack of high-quality annotations. Semi-supervised object detection emerges as a potent methodology to enhance model performance and generalization by leveraging a vast pool of unlabeled data alongside a minimal set of labeled data. A striking observation is that both unlabelled and labeled medical images contain a priori anatomical knowledge from human screening. In this work, we introduce a novel semi-supervised approach named Semi-akmm for mining and matching anatomical knowledge in ultrasound images. We develop an Adaptive Prior Knowledge Transfer (APKT) module to mine and explore the distribution and knowledge of potential proposal boxes by proposal proportion constraint. Furthermore, within a teacher-student learning framework, we put forward an Anatomical Structure Matching (ASM) module to facilitate co-learning consistent topological prior knowledge between the student and teacher models. To our knowledge, this marks the inception of an efficient semi-supervised medical multi-structure detection model. Our experiments across five publicly available ultrasound datasets demonstrate that Semi-akmm sets a new benchmark in performance with solid results that outperform existing methods.

NeurIPS Conference 2025 Conference Paper

Learning to Zoom with Anatomical Relations for Medical Structure Detection

  • Bin Pu
  • Liwen Wang
  • Xingbo Dong
  • Xingguo Lv
  • Zhe Jin

Accurate anatomical structure detection is a critical preliminary step for diagnosing diseases characterized by structural abnormalities. In clinical practice, medical experts frequently adjust the zoom level of medical images to obtain comprehensive views for diagnosis. This common interaction results in significant variations in the apparent scale of anatomical structures across different images or fields of view. However, the information embedded in these zoom-induced scale changes is often overlooked by existing detection algorithms. In addition, human organs possess a priori, fixed topological knowledge. To overcome this limitation, we propose ZR-DETR, a zoom-aware probabilistic framework tailored for medical object detection. ZR-DETR uniquely incorporates scale-sensitive zoom embeddings, anatomical relation constraints, and a Gaussian Process-based detection head. This architecture enables the framework to jointly model semantic context, enforce anatomical plausibility, and quantify detection uncertainty. Empirical validation across three diverse medical imaging benchmarks demonstrates that ZR-DETR consistently outperforms strong baselines in both single-domain and unsupervised domain adaptation scenarios.

AAAI Conference 2025 Conference Paper

Leveraging Anatomical Consistency for Multi-Object Detection in Ultrasound Images via Source-free Unsupervised Domain Adaptation

  • Bin Pu
  • Xingguo Lv
  • Jiewen Yang
  • Xingbo Dong
  • Yiqun Lin
  • Shengli Li
  • Kenli Li
  • Xiaomeng Li

Source-free unsupervised domain adaptation aims to eliminate domain shifts when data from the source domain and annotation from the target domain are not available. The multi-object detection tasks in medical image analysis are constrained by patient privacy and extremely huge annotation consumption. Hence, Source-free UDA is considered a more practical approach for eliminating the domain gap. However, relevant research that explores this topic is a dearth. In this paper, we design an Anatomy-aware Alignment Teacher-Student learning method using topological consistency based on a mean-teacher framework for Source-free UDA in multiple medical object detection named AATS, including Unsupervised Structure Refinement (USR) and Graph-aware Morphology Alignment (GMA). To match the student and teacher at the low-level and visual features, we propose the USR via an unsupervised clustering algorithm to group organs in ultrasound images. Based on USR, we obtain a graph with organ relations on the teacher branch. While in the student branch, we acquire visual features to construct graphical space and optimize the model with graph propagation. Finally, to match the student and teacher, GMA is designed to align the teacher and student based on both topology and morphology information that is derived from prior medical knowledge. Four groups of adaptation experiments were conducted on available medical datasets, and the outcomes demonstrate that our approach not only achieves state-of-the-art performance but also provides substantial advantages over existing methods.

JBHI Journal 2025 Journal Article

MambaSAM: A Visual Mamba-Adapted SAM Framework for Medical Image Segmentation

  • Pengchen Liang
  • Leijun Shi
  • Bin Pu
  • Renkai Wu
  • Jianguo Chen
  • Lixin Zhou
  • Lite Xu
  • Zhuangzhuang Chen

The Segment Anything Model (SAM) has shown exceptional versatility in segmentation tasks across various natural image scenarios. However, its application to medical image segmentation poses significant challenges due to the intricate anatomical details and domain-specific characteristics inherent in medical images. To address these challenges, we propose a novel VMamba adapter framework that integrates a lightweight, trainable Visual Mamba (VMamba) branch with the pre-trained SAM ViT encoder. The VMamba adapter accurately captures multi-scale contextual correlations, integrates global and local information, and reduces ambiguities arising from local features only. Specifically, we propose a novel cross-branch attention (CBA) mechanism to facilitate effective interaction between the SAM and VMamba branches. This mechanism enables the model to learn and adapt more efficiently to the nuances of medical images, extracting rich, complementary features that enhance its representational capacity. Beyond architectural enhancements, we streamline the segmentation workflow by eliminating the need for prompt-driven input mechanisms. This results in an autonomous prediction model that reduces manual input requirements and improves operational efficiency. In addition, our method introduces only minimal additional trainable parameters, offering an efficient solution for medical image segmentation. Extensive evaluations of four medical image datasets demonstrate that our VMamba adapter framework achieves state-of-the-art performance. Specifically, on the ACDC dataset with limited training data, our method achieves an average Dice coefficient improvement of 0. 18 and reduces the Hausdorff distance by 20. 38 mm compared to the AutoSAM.

JBHI Journal 2025 Journal Article

TKR-FSOD: Fetal Anatomical Structure Few-Shot Detection Utilizing Topological Knowledge Reasoning

  • Xi Li
  • Ying Tan
  • Bocheng Liang
  • Bin Pu
  • Jiewen Yang
  • Lei Zhao
  • Yanqing Kong
  • Lixian Yang

Fetal multi-anatomical structure detection in ultrasound (US) images can clearly present the relationship and influence between anatomical structures, providing more comprehensive information about fetal organ structures and assisting sonographers in making more accurate diagnoses, widely used in structure evaluation. Recently, deep learning methods have shown superior performance in detecting various anatomical structures in ultrasound images, but still have the potential for performance improvement in categories where it is difficult to obtain samples, such as rare diseases. Few-shot learning has attracted a lot of attention in medical image analysis due to its ability to solve the problem of data scarcity. However, existing few-shot learning research in medical image analysis focuses on classification and segmentation, and the research on object detection has been neglected. In this paper, we propose a novel fetal anatomical structure few-shot detection method in ultrasound images, TKR-FSOD, which learns topological knowledge through a Topological Knowledge Reasoning Module to help the model reason about and detect anatomical structures. Furthermore, we propose a Discriminate Ability Enhanced Feature Learning Module that extracts abundant discriminative features to enhance the model's discriminative ability. Experimental results demonstrate that our method outperforms the state-of-the-art baseline methods, exceeding the second-best method with a maximum margin of 4. 8% on 5-shot of split 1 under four-chamber cardiac view.

NeurIPS Conference 2024 Conference Paper

Bidirectional Recurrence for Cardiac Motion Tracking with Gaussian Process Latent Coding

  • Jiewen Yang
  • Yiqun Lin
  • Bin Pu
  • Xiaomeng Li

Quantitative analysis of cardiac motion is crucial for assessing cardiac function. This analysis typically uses imaging modalities such as MRI and Echocardiograms that capture detailed image sequences throughout the heartbeat cycle. Previous methods predominantly focused on the analysis of image pairs lacking consideration of the motion dynamics and spatial variability. Consequently, these methods often overlook the long-term relationships and regional motion characteristic of cardiac. To overcome these limitations, we introduce the GPTrack, a novel unsupervised framework crafted to fully explore the temporal and spatial dynamics of cardiac motion. The GPTrack enhances motion tracking by employing the sequential Gaussian Process in the latent space and encoding statistics by spatial information at each time stamp, which robustly promotes temporal consistency and spatial variability of cardiac dynamics. Also, we innovatively aggregate sequential information in a bidirectional recursive manner, mimicking the behavior of diffeomorphic registration to better capture consistent long-term relationships of motions across cardiac regions such as the ventricles and atria. Our GPTrack significantly improves the precision of motion tracking in both 3D and 4D medical images while maintaining computational efficiency. The code is available at: https: //github. com/xmed-lab/GPTrack.

JBHI Journal 2024 Journal Article

FARN: Fetal Anatomy Reasoning Network for Detection With Global Context Semantic and Local Topology Relationship

  • Lei Zhao
  • Guanghua Tan
  • Qianghui Wu
  • Bin Pu
  • Hongliang Ren
  • Shengli Li
  • Kenli Li

Accurate recognition of fetal anatomical structure is a pivotal task in ultrasound (US) image analysis. Sonographers naturally apply anatomical knowledge and clinical expertise to recognizing key anatomical structures in complex US images. However, mainstream object detection approaches usually treat each structure recognition separately, overlooking anatomical correlations between different structures in fetal US planes. In this work, we propose a Fetal Anatomy Reasoning Network (FARN) that incorporates two kinds of relationship forms: a global context semantic block summarized with visual similarity and a local topology relationship block depicting structural pair constraints. Specifically, by designing the Adaptive Relation Graph Reasoning (ARGR) module, anatomical structures are treated as nodes, with two kinds of relationships between nodes modeled as edges. The flexibility of the model is enhanced by constructing the adaptive relationship graph in a data-driven way, enabling adaptation to various data samples without the need for predefined additional constraints. The feature representation is further refined by aggregating the outputs of the ARGR module. Comprehensive experimental results demonstrate that FARN achieves promising performance in detecting 37 anatomical structures across key US planes in tertiary obstetric screening. FARN effectively utilizes key relationships to improve detection performance, demonstrates robustness to small-scale, similar, and indistinct structures, and avoids some detection errors that deviate from anatomical norms. Overall, our study serves as a resource for developing efficient and concise approaches to model inter-anatomy relationships.

JBHI Journal 2024 Journal Article

HFSCCD: A Hybrid Neural Network for Fetal Standard Cardiac Cycle Detection in Ultrasound Videos

  • Bin Pu
  • Kenli Li
  • Jianguo Chen
  • Yuhuan Lu
  • Qing Zeng
  • Jiewen Yang
  • Shengli Li

In the fetal cardiac ultrasound examination, standard cardiac cycle (SCC) recognition is the essential foundation for diagnosing congenital heart disease. Previous studies have mostly focused on the detection of adult CCs, which may not be applicable to the fetus. In clinical practice, localization of SCCs needs to recognize end-systole (ES) and end-diastole (ED) frames accurately, ensuring that every frame in the cycle is a standard view. Most existing methods are not based on the detection of key anatomical structures, which may not recognize irrelevant views and background frames, results containing non-standard frames, or even it does not work in clinical practice. We propose an end-to-end hybrid neural network based on an object detector to detect SCCs from fetal ultrasound videos efficiently, which consists of 3 modules, namely Anatomical Structure Detection (ASD), Cardiac Cycle Localization (CCL), and Standard Plane Recognition (SPR). Specifically, ASD uses an object detector to identify 9 key anatomical structures, 3 cardiac motion phases, and the corresponding confidence scores from fetal ultrasound videos. On this basis, we propose a joint probability method in the CCL to learn the cardiac motion cycle based on the 3 cardiac motion phases. In SPR, to reduce the impact of structure detection errors on the accuracy of the standard plane recognition, we use XGBoost algorithm to learn the relation knowledge of the detected anatomical structures. We evaluate our method on the test fetal ultrasound video datasets and clinical examination cases and achieve remarkable results. This study may pave the way for clinical practices.

JBHI Journal 2024 Journal Article

Real-Time Automatic M-Mode Echocardiography Measurement With Panel Attention

  • Ching-Hsun Tseng
  • Shao-Ju Chien
  • Po-Shen Wang
  • Shin-Jye Lee
  • Bin Pu
  • Xiao-Jun Zeng

Motion mode (M-mode) echocardiography is essential for measuring cardiac dimension and ejection fraction. However, the current diagnosis is time-consuming and suffers from diagnosis accuracy variance. This work resorts to building an automatic scheme through well-designed and well-trained deep learning to conquer the situation. That is, we proposed RAMEM, an automatic scheme of real-time M-mode echocardiography, which contributes three aspects to address the challenges: 1) provide MEIS, the first dataset of M-mode echocardiograms, to enable consistent results and support developing an automatic scheme; For detecting objects accurately in echocardiograms, it requires big receptive field for covering long-range diastole to systole cycle. However, the limited receptive field in the typical backbone of convolutional neural networks (CNN) and the losing information risk in non-local block (NL) equipped CNN risk the accuracy requirement. Therefore, we 2) propose panel attention embedding with updated UPANets V2, a convolutional backbone network, in a real-time instance segmentation (RIS) scheme for boosting big object detection performance; 3) introduce AMEM, an efficient algorithm of automatic M-mode echocardiography measurement, for automatic diagnosis; The experimental results show that RAMEM surpasses existing RIS schemes (CNNs with NL & Transformers as the backbone) in PASCAL 2012 SBD and human performances in MEIS.

JBHI Journal 2024 Journal Article

SKGC: A General Semantic-Level Knowledge Guided Classification Framework for Fetal Congenital Heart Disease

  • Yuhuan Lu
  • Guanghua Tan
  • Bin Pu
  • Hang Wang
  • Bocheng Liang
  • Kenli Li
  • Jagath C. Rajapakse

Congenital heart disease (CHD) is the most common congenital disability affecting healthy development and growth, even resulting in pregnancy termination or fetal death. Recently, deep learning techniques have made remarkable progress to assist in diagnosing CHD. One very popular method is directly classifying fetal ultrasound images, recognized as abnormal and normal, which tends to focus more on global features and neglects semantic knowledge of anatomical structures. The other approach is segmentation-based diagnosis, which requires a large number of pixel-level annotation masks for training. However, the detailed pixel-level segmentation annotation is costly or even unavailable. Based on the above analysis, we propose SKGC, a universal framework to identify normal or abnormal four-chamber heart (4CH) images, guided by a few annotation masks, while improving accuracy remarkably. SKGC consists of a semantic-level knowledge extraction module (SKEM), a multi-knowledge fusion module (MFM), and a classification module (CM). SKEM is responsible for obtaining high-level semantic knowledge, serving as an abstract representation of the anatomical structures that obstetricians focus on. MFM is a lightweight but efficient module that fuses semantic-level knowledge with the original specific knowledge in ultrasound images. CM classifies the fused knowledge and can be replaced by any advanced classifier. Moreover, we design a new loss function that enhances the constraint between the foreground and background predictions, improving the quality of the semantic-level knowledge. Experimental results on the collected real-world NA-4CH and the publicly FEST datasets show that SKGC achieves impressive performance with the best accuracy of 99. 68% and 95. 40%, respectively. Notably, the accuracy improves from 74. 68% to 88. 14% using only 10 labeled masks.

JBHI Journal 2024 Journal Article

TransFSM: Fetal Anatomy Segmentation and Biometric Measurement in Ultrasound Images Using a Hybrid Transformer

  • Lei Zhao
  • Guanghua Tan
  • Bin Pu
  • Qianghui Wu
  • Hongliang Ren
  • Kenli Li

Biometric parameter measurements are powerful tools for evaluating a fetus's gestational age, growth pattern, and abnormalities in a 2D ultrasound. However, it is still challenging to measure fetal biometric parameters automatically due to the indiscriminate confusing factors, limited foreground-background contrast, variety of fetal anatomy shapes at different gestational ages, and blurry anatomical boundaries in ultrasound images. The performance of a standard CNN architecture is limited for these tasks due to the restricted receptive field. We propose a novel hybrid Transformer framework, TransFSM, to address fetal multi-anatomy segmentation and biometric measurement tasks. Unlike the vanilla Transformer based on a single-scale input, TransFSM has a deformable self-attention mechanism, so it can effectively process multi-scale information to segment fetal anatomy with irregular shapes and different sizes. We devised a boundary-aware decoder (BAD) to capture more intrinsic local details using boundary-wise prior knowledge, which compensates for the defects of the Transformer in extracting local features. In addition, a Transformer auxiliary segment head is designed to improve mask prediction by learning the semantic correspondence of the same pixel categories and feature discriminability among different pixel categories. Extensive experiments were conducted on clinical cases and benchmark datasets for anatomy segmentation and biometric measurement tasks. The experiment results indicate that our method achieves state-of-the-art performance in seven evaluation metrics compared with CNN-based, Transformer-based, and hybrid approaches. By knowledge distillation, the proposed TransFSM can create a more compact and efficient model with high deploying potential in resource-constrained scenarios. Our study serves as a unified framework for biometric estimation across multiple anatomical regions to monitor fetal growth in clinical practice.

ICML Conference 2024 Conference Paper

Unsupervised Domain Adaptation for Anatomical Structure Detection in Ultrasound Images

  • Bin Pu
  • Xingguo Lv
  • Jiewen Yang
  • Guannan He
  • Xingbo Dong
  • Yiqun Lin
  • Shengli Li 0001
  • Tan Ying

Models trained on ultrasound images from one institution typically experience a decline in effectiveness when transferred directly to other institutions. Moreover, unlike natural images, dense and overlapped structures exist in fetus ultrasound images, making the detection of structures more challenging. Thus, to tackle this problem, we propose a new Unsupervised Domain Adaptation (UDA) method named ToMo-UDA for fetus structure detection, which consists of the Topology Knowledge Transfer (TKT) and the Morphology Knowledge Transfer (MKT) module. The TKT leverages prior knowledge of the medical anatomy of fetal as topological information, reconstructing and aligning anatomy features across source and target domains. Then, the MKT formulates a more consistent and independent morphological representation for each substructure of an organ. To evaluate the proposed ToMo-UDA for ultrasound fetal anatomical structure detection, we introduce FUSH$^2$, a new F etal U ltra S ound benchmark, comprises H eart and H ead images collected from Two health centers, with 16 annotated regions. Our experiments show that utilizing topological and morphological anatomy information in ToMo-UDA can greatly improve organ structure detection. This expands the potential for structure detection tasks in medical image analysis.

JBHI Journal 2022 Journal Article

MobileUNet-FPN: A Semantic Segmentation Model for Fetal Ultrasound Four-Chamber Segmentation in Edge Computing Environments

  • Bin Pu
  • Yuhuan Lu
  • Jianguo Chen
  • Shengli Li
  • Ningbo Zhu
  • Wei Wei
  • Kenli Li

The apical four-chamber (A4C) view in fetal echocardiography is a prenatal examination widely used for the early diagnosis of congenital heart disease (CHD). Accurate segmentation of A4C key anatomical structures is the basis for automatic measurement of growth parameters and necessary disease diagnosis. However, due to the ultrasound imaging arising from artefacts and scattered noise, the variability of anatomical structures in different gestational weeks, and the discontinuity of anatomical structure boundaries, accurately segmenting the fetal heart organ in the A4C view is a very challenging task. To this end, we propose to combine an explicit Feature Pyramid Network (FPN), MobileNet and UNet, i. e. , MobileUNet-FPN, for the segmentation of 13 key heart structures. To our knowledge, this is the first AI-based method that can segment so many anatomical structures in fetal A4C view. We split the MobileNet backbone network into four stages and use the features of these four phases as the encoder and the upsampling operation as the decoder. We build an explicit FPN network to enhance multi-scale semantic information and ultimately generate segmentation masks of key anatomical structures. In addition, we design a multi-level edge computing system and deploy the distributed edge nodes in different hospitals and city servers, respectively. Then, we train the MobileUNet-FPN model in parallel at each edge node to effectively reduce the network communication overhead. Extensive experiments are conducted and the results show the superior performance of the proposed model on the fetal A4C and femoral-length images.