Arrow Research search

Author name cluster

Jiahua Rao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

AAAI Conference 2026 Conference Paper

Advancing Protein Design via Multi-Agent Reinforcement Learning with Pareto-Based Collaborative Optimization

  • Mingming Zhu
  • Jiahua Rao
  • Xiaoyu Chen
  • Qianmu Yuan
  • Yuedong Yang

Protein design is revolutionizing biotechnology, yet existing approaches struggle to balance structural foldability with functional performance. Structure-based models excel at generating stable protein backbones but often overlook critical functional properties, while protein language models capture evolutionary and functional signals but frequently predict sequences lacking structural stability. Integrating these complementary approaches remains challenging due to their inherently conflicting objectives. We present MAProt, a multi-agent framework that synergistically combines structure-based and protein language model-based methods for protein design. Each agent specializes in a distinct aspect of the design objective: the structure-based agent (e.g., ProteinMPNN) ensures compatibility with the target backbone, while protein language model-based agents (e.g., ESM, SaProt) capture evolutionary plausibility and functional potential. To reconcile conflicts and achieve optimal trade-offs, we introduce a Pareto-based negotiation module that enables effective multi-objective coordination and consensus among agents. Extensive experiments on benchmark datasets demonstrate that MAProt achieves a remarkable improvement over state-of-the-art baselines, and generalizes robustly across a range of tasks, including thermodynamic folding stability design, functional protein design, and high-affinity antibody design. These results highlight the power of collaborative optimization for advancing rational protein engineering.

AAAI Conference 2026 Conference Paper

De Novo Molecular Generation from Mass Spectra via Many-Body Enhanced Diffusion

  • Xichen Sun
  • Wentao Wei
  • Jiahua Rao
  • Jiancong Xie
  • Yuedong Yang

Molecular structure generation from mass spectrometry is fundamental for understanding cellular metabolism and discovering novel compounds. Although tandem mass spectrometry (MS/MS) enables the high-throughput acquisition of fragment fingerprints, these spectra often reflect higher-order interactions involving the concerted cleavage of multiple atoms and bonds-crucial for resolving complex isomers and non-local fragmentation mechanisms. However, most existing methods adopt atom-centric and pairwise interaction modeling, overlooking higher-order edge interactions and lacking the capacity to systematically capture essential many-body characteristics for structure generation. To overcome these limitations, we present MBGen, a Many-Body enhanced diffusion framework for de novo molecular structure Generation from mass spectra. By integrating a many-body attention mechanism and higher-order edge modeling, MBGen comprehensively leverages the rich structural information encoded in MS/MS spectra, enabling accurate de novo generation and isomer differentiation for novel molecules. Experimental results on the NPLIB1 and MassSpecGym benchmarks demonstrate that MBGen achieves superior performance, with improvements of up to 230% over state-of-the-art methods, highlighting the scientific value and practical utility of many-body modeling for mass spectrometry-based molecular generation. Further analysis and ablation studies show that our approach effectively captures higher-order interactions and exhibits enhanced sensitivity to complex isomeric and non-local fragmentation information.

AAAI Conference 2026 Conference Paper

Informative Subgraph Extraction with Deep Reinforcement Learning for Drug-Drug Interaction Prediction

  • Jiancong Xie
  • Wentao Wei
  • Chi Zhang
  • Jiahua Rao
  • Yuedong Yang

Drug-drug interaction (DDI) prediction is pivotal for drug safety and clinical decision-making. Recently, subgraph-based methods utilizing knowledge graphs (KGs) and domain information have achieved promising results by extracting informative subgraphs for DDI prediction. However, existing subgraph extraction methods are typically coarse-grained and nonspecific, facing two key limitations: First, they are constrained by the vast and noisy nature of real-world KGs, making it challenging to identify the most informative substructures from the massive space of candidate subgraphs. Second, current methods often fail to exploit the molecular structural specificity of drugs to selectively extract relevant subgraphs, lacking effective integration of molecular structure information with knowledge graph context. To address these challenges, we propose RISE-DDI, a novel framework for Reinforced-based Informative Subgraph Extraction approach for drug-drug interaction prediction. Specifically, RISE-DDI formulates the subgraph extraction as a Markov Decision Process (MDP) and leverages a deep reinforcement learning (RL) agent to dynamically and adaptively extract the most informative and context-specific subgraphs for each drug pair. The agent is guided by a learnable structure-aware reward model that considers both the topological context from the knowledge graph and the molecular features of the drug pairs, thereby encouraging the selection of subgraphs that are both structurally relevant and biologically informative. Extensive experiments on DDI benchmark datasets demonstrate that our method outperforms state-of-the-art baselines in both transductive and inductive scenarios, achieving improvements of up to 20%. Furthermore, visualization analyses of the extracted subgraphs highlight the interpretability of our model, providing insights into the underlying mechanisms of drug interactions.

NeurIPS Conference 2025 Conference Paper

Accurately Predicting Protein Mutational Effects via a Hierarchical Many-Body Attention Network

  • Dahao Xu
  • Jiahua Rao
  • Mingming Zhu
  • Jixian Zhang
  • Wei Lu
  • Shuangjia Zheng
  • Yuedong Yang

Predicting changes in binding free energy ($\Delta\Delta G$) is essential for understanding protein-protein interactions, which are critical in drug design and protein engineering. However, existing methods often rely on pre-trained knowledge and heuristic features, limiting their ability to accurately model complex mutation effects, particularly higher-order and many-body interactions. To address these challenges, we propose H3-DDG, a Hypergraph-driven Hierarchical network to capture Higher-order many-body interactions across multiple scales. By introducing a hierarchical communication mechanism, H3-DDG effectively models both local and global mutational effects. Experimental results demonstrate state-of-the-art performance on multiple benchmarks. On the SKEMPI v2 dataset, H3-DDG achieves a Pearson correlation of 0. 75, improving multi-point mutations prediction by 12. 10%. On the challenging BindingGYM dataset, it outperforms Prompt-DDG and BA-DDG by 62. 61% and 34. 26%, respectively. Ablation and efficiency analyses demonstrate its robustness and scalability, while a case study on SARS-CoV-2 antibodies highlights its practical value in improving binding affinity for therapeutic design.

AAAI Conference 2025 Conference Paper

Advancing Retrosynthesis with Retrieval-Augmented Graph Generation

  • Anjie Qiao
  • Zhen Wang
  • Jiahua Rao
  • Yuedong Yang
  • Zhewei Wei

Diffusion-based molecular graph generative models have achieved significant success in template-free, single-step retrosynthesis prediction. However, these models typically generate reactants from scratch, often overlooking the fact that the scaffold of a product molecule typically remains unchanged during chemical reactions. To leverage this useful observation, we introduce a retrieval-augmented molecular graph generation framework. Our framework comprises three key components: a retrieval component that identifies similar molecules for the given product, an integration component that learns valuable clues from these molecules about which part of the product should remain unchanged, and a base generative model that is prompted by these clues to generate the corresponding reactants. We explore various design choices for critical and under-explored aspects of this framework and instantiate it as the Retrieval-Augmented RetroBridge (RARB). RARB demonstrates state-of-the-art performance on standard benchmarks, achieving a 14.8% relative improvement in top-1 accuracy over its base generative model, highlighting the effectiveness of retrieval augmentation. Additionally, RARB excels in handling out-of-distribution molecules, and its advantages remain significant even with smaller models or fewer denoising steps. These strengths make RARB highly valuable for real-world retrosynthesis applications, where extrapolation to novel molecules and high-throughput prediction are essential.

ICML Conference 2025 Conference Paper

Quadruple Attention in Many-body Systems for Accurate Molecular Property Predictions

  • Jiahua Rao
  • Dahao Xu
  • Wentao Wei
  • Yicong Chen
  • Mingjun Yang
  • Yuedong Yang

While Graph Neural Networks and Transformers have shown promise in predicting molecular properties, they struggle with directly modeling complex many-body interactions. Current methods often approximate interactions like three- and four-body terms in message passing, while attention-based models, despite enabling direct atom communication, are typically limited to triplets, making higher-order interactions computationally demanding. To address the limitations, we introduce MABNet, a geometric attention framework designed to model four-body interactions by facilitating direct communication among atomic quartets. This approach bypasses the computational bottlenecks associated with traditional triplet-based attention mechanisms, allowing for the efficient handling of higher-order interactions. MABNet achieves state-of-the-art performance on benchmarks like MD22 and SPICE. These improvements underscore its capability to accurately capture intricate many-body interactions in large molecules. By unifying rigorous many-body physics with computational efficiency, MABNet advances molecular simulations for applications in drug design and materials discovery, while its extensible framework paves the way for modeling higher-order quantum effects.

NeurIPS Conference 2025 Conference Paper

Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model

  • Yicong Chen
  • Jiahua Rao
  • Jiancong Xie
  • Dahao Xu
  • Zhen Wang
  • Yuedong Yang

Virtual Screening (VS) is vital for drug discovery but struggles with low hit rates and high computational costs. While Active Learning (AL) has shown promise in improving the efficiency of VS, traditional methods rely on inflexible and handcrafted heuristics, limiting adaptability in complex chemical spaces, particularly in balancing molecular diversity and selection accuracy. To overcome these challenges, we propose GLARE, a reinforced active learning framework that reformulates VS as a Markov Decision Process (MDP). Using Group Relative Policy Optimization (GRPO), GLARE dynamically balances chemical diversity, biological relevance, and computational constraints, eliminating the need for inflexible heuristics. Experiments show GLARE outperforms state-of-the-art AL methods, with a 64. 8% average improvement in Enrichment Factors (EF). Additionally, GLARE enhances the performance of VS foundation models like DrugCLIP, achieving up to an 8-fold improvement in EF$_{0. 5\\%}$ with as few as 15 active molecules. These results highlight the transformative potential of GLARE for adaptive and efficient drug discovery.

NeurIPS Conference 2025 Conference Paper

RiboFlow: Conditional De Novo RNA Co-Design via Synergistic Flow Matching

  • Runze Ma
  • Zhongyue Zhang
  • Zichen Wang
  • Chenqing Hua
  • Jiahua Rao
  • Zhuomin Zhou
  • Shuangjia Zheng

Ribonucleic acid (RNA) binds to molecules to achieve specific biological functions. While generative models are advancing biomolecule design, existing methods for designing RNA that target specific ligands face limitations in capturing RNA’s conformational flexibility, ensuring structural validity, and overcoming data scarcity. To address these challenges, we introduce RiboFlow, a synergistic flow matching model to co-design RNA structures and sequences based on target molecules. By integrating RNA backbone frames, torsion angles, and sequence features in an unified architecture, RiboFlow explicitly models RNA’s dynamic conformations while enforcing sequence-structure consistency to improve validity. Additionally, we curate RiboBind, a large-scale dataset of RNA-molecule interactions, to resolve the scarcity of high-quality structural data. Extensive experiments reveal that RiboFlow not only outperforms state-of-the-art RNA design methods by a large margin but also showcases controllable capabilities for achieving high binding affinity to target ligands. Our work bridges critical gaps in controllable RNA design, offering a framework for structure-aware, data-efficient generation.

IJCAI Conference 2022 Conference Paper

Communicative Subgraph Representation Learning for Multi-Relational Inductive Drug-Gene Interaction Prediction

  • Jiahua Rao
  • Shuangjia Zheng
  • Sijie Mai
  • Yuedong Yang

Illuminating the interconnections between drugs and genes is an important topic in drug development and precision medicine. Currently, computational predictions of drug-gene interactions mainly focus on the binding interactions without considering other relation types like agonist, antagonist, etc. In addition, existing methods either heavily rely on high-quality domain features or are intrinsically transductive, which limits the capacity of models to generalize to drugs/genes that lack external information or are unseen during the training process. To address these problems, we propose a novel Communicative Subgraph representation learning for Multi-relational Inductive drug-Gene interactions prediction (CoSMIG), where the predictions of drug-gene relations are made through subgraph patterns, and thus are naturally inductive for unseen drugs/genes without retraining or utilizing external domain features. Moreover, the model strengthened the relations on the drug-gene graph through a communicative message passing mechanism. To evaluate our method, we compiled two new benchmark datasets from DrugBank and DGIdb. The comprehensive experiments on the two datasets showed that our method outperformed state-of-the-art baselines in the transductive scenarios and achieved superior performance in the inductive ones. Further experimental analysis including LINCS experimental validation and literature verification also demonstrated the value of our model.

NeurIPS Conference 2022 Conference Paper

TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction

  • Wei Lu
  • Qifeng Wu
  • Jixian Zhang
  • Jiahua Rao
  • Chengtao Li
  • Shuangjia Zheng

Illuminating interactions between proteins and small drug molecules is a long-standing challenge in the field of drug discovery. Despite the importance of understanding these interactions, most previous works are limited by hand-designed scoring functions and insufficient conformation sampling. The recently-proposed graph neural network-based methods provides alternatives to predict protein-ligand complex conformation in a one-shot manner. However, these methods neglect the geometric constraints of the complex structure and weaken the role of local functional regions. As a result, they might produce unreasonable conformations for challenging targets and generalize poorly to novel proteins. In this paper, we propose Trigonometry-Aware Neural networKs for binding structure prediction, TANKBind, that builds trigonometry constraint as a vigorous inductive bias into the model and explicitly attends to all possible binding sites for each protein by segmenting the whole protein into functional blocks. We construct novel contrastive losses with local region negative sampling to jointly optimize the binding interaction and affinity. Extensive experiments show substantial performance gains in comparison to state-of-the-art physics-based and deep learning-based methods on commonly-used benchmark datasets for both binding structure and affinity predictions with variant settings.

IJCAI Conference 2021 Conference Paper

Learning Attributed Graph Representation with Communicative Message Passing Transformer

  • Jianwen Chen
  • Shuangjia Zheng
  • Ying Song
  • Jiahua Rao
  • Yuedong Yang

Constructing appropriate representations of molecules lies at the core of numerous tasks such as material science, chemistry, and drug designs. Recent researches abstract molecules as attributed graphs and employ graph neural networks (GNN) for molecular representation learning, which have made remarkable achievements in molecular graph modeling. Albeit powerful, current models either are based on local aggregation operations and thus miss higher-order graph properties or focus on only node information without fully using the edge information. For this sake, we propose a Communicative Message Passing Transformer (CoMPT) neural network to improve the molecular graph representation by reinforcing message interactions between nodes and edges based on the Transformer architecture. Unlike the previous transformer-style GNNs that treat molecule as a fully connected graph, we introduce a message diffusion mechanism to leverage the graph connectivity inductive bias and reduce the message enrichment explosion. Extensive experiments demonstrated that the proposed model obtained superior performances (around 4% on average) against state-of-the-art baselines on seven chemical property datasets (graph-level tasks) and two chemical shift datasets (node-level tasks). Further visualization studies also indicated a better representation capacity achieved by our model.