DFMN: A Dual-feet Matching Network with Hybrid Transformer-based Feature Extractor for Unsupervised Deformable Medical Image Registration

Liwen Li; Xinrui Guo; Wentao Guo; Shunqi Yang; Fumin Guo

doi:10.1609/aaai.v40i8.37559

Back to AAAI

AAAI 2026

DFMN: A Dual-feet Matching Network with Hybrid Transformer-based Feature Extractor for Unsupervised Deformable Medical Image Registration

Conference Paper AAAI Technical Track on Computer Vision V Artificial Intelligence

PDF Details DOI

Abstract

Deformable medical image registration is essential in medical image analyses. Recent transformer-based registration methods have achieved high registration accuracy. However, these methods often rely on patch embedding at the beginning of encoding, resulting in limited ability to capture detailed anatomical structural information in the images and explore local semantic relationships within individual patches. Here, we proposed a novel Dual-feet Encoder (DFEnc) to asynchronously model semantic information from moving and fixed images at various scales through two separate branches in three steps. For each step, features from adjacent resolution levels were processed by a Single Step Hybrid Extractor (SSHExt), which performed patch convolution to preserve local information, followed by several transformer blocks to capture global context. Dense connections were employed to enhance semantic awareness across adjacent feature resolution levels. Additionally, we introduced a Feature Fusion-based Decoder (FFDec) to progressively fuse features related to the fixed and moving images and to generate intermediate deformation fields at each stage, enabling accurate image alignment through stepwise warping and alignment refinement. Extensive ablation studies demonstrated the effectiveness of the proposed DFEnc, SSHExt, and FFDec. Compared to a state-of-the-art AutoFuse-Trans method, our approach yielded improvements in Dice of 1.14%, 1.77%, and 4.47% on the ACDC, OASIS, and Abdomen CT datasets, respectively, while maintaining relatively low computational cost. These results suggest the utility of the proposed approach for broad research and clinical applications.

DFMN: A Dual-feet Matching Network with Hybrid Transformer-based Feature Extractor for Unsupervised Deformable Medical Image Registration

Abstract

Authors

Keywords

Context