Arrow Research search

Author name cluster

Mingze Yin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
1 author row

Possible papers

4

JBHI Journal 2026 Journal Article

RetinexDA: Progressive Disentanglement Domain Adaptation for Unsupervised Cross-Modality Medical Image Segmentation

  • Yixuan Wu
  • Mingze Yin
  • Zitai Kong
  • Jintai Chen
  • Jian Wu
  • Honghao Gao
  • Hongxia Xu

Deep neural networks have achieved strong performance in medical image segmentation when the training and testing data share similar appearance characteristics. However, this assumption is rarely satisfied in practical clinical scenarios, where imaging protocols, scanner vendors, and modality physics differ substantially, resulting in severe performance degradation when the model is deployed to new environments. To address this challenge, we propose RetinexDA, a novel unsupervised domain adaptation framework that explicitly decomposes a medical image into domain-invariant structural and domain-specific appearance representations. This Retinex-inspired formulation preserves essential anatomical details while mitigating modality-dependent variations. Furthermore, we introduce Disentangled Knowledge Distillation (DKD) to ensure mutual semantic alignment between the structure–appearance decomposition in pixel space and the encoded features in latent space, strengthening fine-grained segmentation capability. In addition, a Bézier-curve domain bridging strategy is developed to generate smoothly transitioned intermediate samples across domains, improving adaptation robustness under large modality discrepancies. Extensive experiments on abdominal CT and cardiac MRI segmentation tasks demonstrate that RetinexDA surpasses state-of-the-art unsupervised domain adaptation approaches, showing strong potential for scalable and reliable clinical deployment.

AAAI Conference 2025 Conference Paper

ProtCLIP: Function-Informed Protein Multi-Modal Learning

  • Hanjing Zhou
  • Mingze Yin
  • Wei Wu
  • Mingyang Li
  • Kun Fu
  • Jintai Chen
  • Jian Wu
  • Zheng Wang

Multi-modality pre-training paradigm that aligns protein sequences and biological descriptions has learned general protein representations and achieved promising performance in various downstream applications. However, these works were still unable to replicate the extraordinary success of language-supervised visual foundation models due to the ineffective usage of aligned protein-text paired data and the lack of an effective function-informed pre-training paradigm. To address these issues, this paper curates a large-scale protein-text paired dataset called ProtAnno with a property-driven sampling strategy, and introduces a novel function-informed protein pre-training paradigm. Specifically, the sampling strategy determines selecting probability based on the sample confidence and property coverage, balancing the data quality and data quantity in face of large-scale noisy data. Furthermore, motivated by significance of the protein specific functional mechanism, the proposed paradigm explicitly model protein static and dynamic functional segments by two segment-wise pre-training objectives, injecting fine-grained information in a function-informed manner. Leveraging all these innovations, we develop ProtCLIP, a multi-modality foundation model that comprehensively represents function-aware protein embeddings. On 22 different protein benchmarks within 5 types, including protein functionality classification, mutation effect prediction, cross-modal transformation, semantic similarity inference and protein-protein interaction prediction, our ProtCLIP consistently achieves SOTA performance, with remarkable improvements of 75% on average in five cross-modal transformation benchmarks, 59.9% in GO-CC and 39.7% in GO-BP protein function prediction. The experimental results verify the extraordinary potential of ProtCLIP serving as the protein multi-modality foundation model.

AAAI Conference 2025 Conference Paper

Synergy of GFlowNet and Protein Language Model Makes a Diverse Antibody Designer

  • Mingze Yin
  • Hanjing Zhou
  • Yiheng Zhu
  • Jialu Wu
  • Wei Wu
  • Mingyang Li
  • Kun Fu
  • Zheng Wang

Antibodies defend our health by binding to antigens with high specificity and potentiality, primarily relying on the Complementarity-Determining Region (CDR). Yet, current experimental methods of discovering new antibody CDRs are heavily time-consuming. Computational design could alleviate this burden; especially, protein language models have proven quite beneficial in many recent studies. However, most existing models solely focus on antibody potentiality and struggle to encapsulate the diverse range of plausible CDR candidates, limiting their effectiveness in real-world scenarios as binding is only one factor in the multitude of drug-forming criteria. In this paper, we introduce PG-AbD, a framework uniting Generative Flow Networks (GFlowNets) and pretrained Protein Language Models (PLMs) to successfully generate highly potent, diverse and novel antibody candidates. We innovatively construct a Products of Experts (PoE) composed by the global-distribution-modeling PLM and the local-distribution-modeling Potts Model to serve as the reward function of GFlowNet. The joint training paradigm is introduced, where PoE is trained by contrastive divergence with the negative samples generated by GFlowNet, and then guides GFlowNet to sample diverse antibody candidates. We evaluate PG-AbD on extensive antibody design benchmarks. It significantly outperforms existing methods in diversity (13.5% on RabDab, 31.1% on SabDab) while maintaining optimal potential and novelty. Generated antibodies are also found to form stable, regular 3D structures with their corresponding antigens, demonstrating the great potential of PG-AbD to accelerate real-world antibody discovery.

NeurIPS Conference 2024 Conference Paper

Bridge-IF: Learning Inverse Protein Folding with Markov Bridges

  • Yiheng Zhu
  • Jialu Wu
  • Qiuyi Li
  • Jiahuan Yan
  • Mingze Yin
  • Wei Wu
  • Mingyang Li
  • Jieping Ye

Inverse protein folding is a fundamental task in computational protein design, which aims to design protein sequences that fold into the desired backbone structures. While the development of machine learning algorithms for this task has seen significant success, the prevailing approaches, which predominantly employ a discriminative formulation, frequently encounter the error accumulation issue and often fail to capture the extensive variety of plausible sequences. To fill these gaps, we propose Bridge-IF, a generative diffusion bridge model for inverse folding, which is designed to learn the probabilistic dependency between the distributions of backbone structures and protein sequences. Specifically, we harness an expressive structure encoder to propose a discrete, informative prior derived from structures, and establish a Markov bridge to connect this prior with native sequences. During the inference stage, Bridge-IF progressively refines the prior sequence, culminating in a more plausible design. Moreover, we introduce a reparameterization perspective on Markov bridge models, from which we derive a simplified loss function that facilitates more effective training. We also modulate protein language models (PLMs) with structural conditions to precisely approximate the Markov bridge process, thereby significantly enhancing generation performance while maintaining parameter-efficient training. Extensive experiments on well-established benchmarks demonstrate that Bridge-IF predominantly surpasses existing baselines in sequence recovery and excels in the design of plausible proteins with high foldability. The code is available at https: //github. com/violet-sto/Bridge-IF.