Author name cluster

Haichuan Tan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation

Wenyu Zhu
Jianhui Wang
Bowen Gao
Yinjun Jia
Haichuan Tan
Ya-Qin Zhang
Wei-Ying Ma
Yanyan Lan

Virtual screening (VS) is a critical component of modern drug discovery, yet most existing methods—whether physics-based or deep learning-based—are developed around holo protein structures with known ligand-bound pockets. Consequently, their performance degrades significantly on apo or predicted structures such as those from AlphaFold2, which are more representative of real-world early-stage drug discovery, where pocket information is often missing. In this paper, we introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty. Our method comprises two core components: (1) a tri-modal contrastive learning module that aligns representations of the ligand, the holo pocket, and cavities detected from structures, thereby enhancing robustness to pocket localization error; and (2) a cross-attention based adapter for dynamically aggregating candidate binding sites, enabling the model to learn from activity data even without precise pocket annotations. We evaluated our method on a newly curated benchmark of apo structures, where it significantly outperforms state-of-the-art methods in blind apo setting, improving the early enrichment factor (EF1\%) from 11. 75 to 37. 19. Notably, it also maintains strong performance on holo structures. These results demonstrate the promise of our approach in advancing first-in-class drug discovery, particularly in scenarios lacking experimentally resolved protein-ligand complexes. Our implementation is publicly available at https: //github. com/Wiley-Z/AANet.

PDF Details

NeurIPS Conference 2025 Conference Paper

CIDD: Collaborative Intelligence for Structure-Based Drug Design Empowered by LLMs

Bowen Gao
Yanwen Huang
Yiqiao Liu
Wenxuan Xie
Bowei He
Haichuan Tan
Wei-Ying Ma
Ya-Qin Zhang

Structure-guided molecular generation is pivotal in early-stage drug discovery, enabling the design of compounds tailored to specific protein targets. However, despite recent advances in 3D generative modeling, particularly in improving docking scores, these methods often produce rare and intrinsically irrational molecular structures that deviate from drug-like chemical space. To quantify this issue, we propose a novel metric, the Molecule Reasonable Ratio (MRR), which measures structural rationality and reveals a critical gap between existing models and real-world approved drugs. To address this, we introduce the Collaborative Intelligence Drug Design (CIDD) framework, the first approach to unify the 3D interaction modeling capabilities of generative models with the general knowledge and reasoning power of large language models (LLMs). By leveraging LLM-based Chain-of-Thought reasoning, CIDD generates molecules that not only bind effectively to protein pockets but also exhibit strong structural drug-likeness, rationality, and synthetic accessibility. On the CrossDocked2020 benchmark, CIDD consistently improves drug-likeness metrics, including QED, SA, and MRR, across different base generative models, while maintaining competitive binding affinity. Notably, it raises the combined success rate (balancing drug-likeness and binding) from 15. 72% to 34. 59%, more than doubling previous results. These findings demonstrate the value of integrating knowledge reasoning with geometric generation to advance AI-driven drug design.

PDF Details

ICLR Conference 2025 Conference Paper

Reframing Structure-Based Drug Design Model Evaluation via Metrics Correlated to Practical Needs

Bowen Gao
Haichuan Tan
Yanwen Huang
Minsi Ren
Xiao Huang
Wei-Ying Ma
Ya-Qin Zhang
Yanyan Lan

Recent advances in structure-based drug design (SBDD) have produced surprising results, with models often generating molecules that achieve better Vina docking scores than actual ligands. However, these results are frequently overly optimistic due to the limitations of docking score accuracy and the challenges of wet-lab validation. While generated molecules may demonstrate high QED (drug-likeness) and SA (synthetic accessibility) scores, they often lack true drug-like properties or synthesizability. To address these limitations, we propose a model-level evaluation framework that emphasizes practical metrics aligned with real-world applications. Inspired by recent findings on the utility of generated molecules in ligand-based virtual screening, our framework evaluates SBDD models by their ability to produce molecules that effectively retrieve active compounds from chemical libraries via similarity-based searches. This approach provides a direct indication of therapeutic potential, bridging the gap between theoretical performance and real-world utility. Our experiments reveal that while SBDD models may excel in theoretical metrics like Vina scores, they often fall short in these practical metrics. By introducing this new evaluation strategy, we aim to enhance the relevance and impact of SBDD models for pharmaceutical research and development.

Details

NeurIPS Conference 2023 Conference Paper

DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening

Bowen Gao
Bo Qiang
Haichuan Tan
Yinjun Jia
Minsi Ren
Minsi Lu
Jingjing Liu
Wei-Ying Ma

Virtual screening, which identifies potential drugs from vast compound databases to bind with a particular protein pocket, is a critical step in AI-assisted drug discovery. Traditional docking methods are highly time-consuming, and can only work with a restricted search library in real-life applications. Recent supervised learning approaches using scoring functions for binding-affinity prediction, although promising, have not yet surpassed docking methods due to their strong dependency on limited data with reliable binding-affinity labels. In this paper, we propose a novel contrastive learning framework, DrugCLIP, by reformulating virtual screening as a dense retrieval task and employing contrastive learning to align representations of binding protein pockets and molecules from a large quantity of pairwise data without explicit binding-affinity scores. We also introduce a biological-knowledge inspired data augmentation strategy to learn better protein-molecule representations. Extensive experiments show that DrugCLIP significantly outperforms traditional docking and supervised learning methods on diverse virtual screening benchmarks with highly reduced computation time, especially in zero-shot setting.

PDF Details