JBHI Journal 2026 Journal Article
Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models
- Wei Peng
- Jianchen Hu
- Kang Liu
- Meng Zhang
Prompt learning has emerged as one of the most effective paradigms for adapting pre-trained vision language models (VLMs) to biomedical image classification tasks in few-shot scenarios. However, most existing prompt learning methods rely on a single textual prompt, often ignoring the particular visual structures (e. g. , the complex anatomical structures and subtle pathological features) in biomedical images. In this work, we propose Biomed DPT, a knowledge-enhanced dual-modality prompt tuning framework. For text prompts, Biomed-DPT constructs a dual prompt including template-driven ensemble clinical prompts and large language model (LLM)-driven expert domain adapted prompts. These prompts are systematically ranked and their optimal combination is searched for using a neural network. A semantic regularization loss is then applied to extract clinical knowledge while mitigating semantic discrepancies. For visual prompts, Biomed-DPT introduces zero vectors as soft prompts to leverage attention re-weighting so that the focus on non-diagnostic regions and the recognition of non-critical pathological features are avoided. Biomed DPT achieves an average classification accuracy of 66. 28% across 11 biomedical image datasets covering 9 modalities and 10 organs, with performance reaching 79. 54% in base classes and 76. 91% in novel classes. Our code is available at: https://github.com/pengwei222/Biomed-DPT.