JBHI Journal 2026 Journal Article
An Explainable Molecular Token Estimation Method for Knowledge-aware Drug-Drug Interaction Prediction
- Hui Yu
- Chao Song
- Jiahao Yuan
- Xinkun Li
- Xiao Zhang
- Yang Yang
- Zhe Yu
- Jian-Yu Shi
In molecular representation learning (MRL), tokens ( e. g. , atoms, motifs, and fingerprints) are the basic elements to represent molecules. It is a common practice by using various tokens to enhance the expressive power of Graph Neural Networks (GNNs) on molecular graphs. Although prior GNNs-based methods employing tokens achieve promising performances in drug-drug interaction (DDI) prediction, the influence of the token on the expressiveness of molecular embedding models remains underexplored. To bridge the gap, we provide an axiomatic definition of MRL from a frequency domain perspective, revealing that the model's performance is closely related to the number of tokens and deriving a theoretical upper bound of likelihood-based model convergency. Building on these insights, we propose SimMotifPro, a simple yet efficient motif-based method, for DDI prediction. Specifically, SimMotifPro uses a variant of DeeperGCN encoder and builds a motif-motif knowledge graph to capture motif interconnections. A Motif Ranker module is also introduced to decouple learned representations and differentiate the contributions of selected motifs. Empirically, we demonstrate that SimMotifPro adheres to the properties demonstrated in our theoretical upper bound and validate the general applicability of our theory across different methods. Furthermore, our approach achieves state-of-the-art performance on various benchmarks for DDI prediction. Our codes and checkpoints are available at https://github.com/siriusong/sim_motif_pro.