Arrow Research search

Author name cluster

Fang Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

JBHI Journal 2026 Journal Article

D-Flow: Multi-modality Flow Matching for D-peptide Design

  • Fang Wu
  • Shuting Jin
  • Xiangru Tang
  • Junlin Xu
  • Mark Gerstein
  • Li Erran Li
  • James Zou

Proteins are crucial to biological processes, and therapeutic peptides are emerging as promising pharmaceutical agents. Among these, D-peptides are resistant to proteolysis, exhibit greater in vivo stability, and are easier to synthesize. Despite advances in deep learning for peptide discovery, the scarcity of natural D-protein data limits the transfer of existing generative models to the D-peptide chemical space. We propose D-Flow, a full-atom flow-based framework for de novo D-peptide design. Conditioned on receptor binding, D-Flow uses structural representations incorporating backbone frames, side-chain angles, and discrete amino acid types. A mirror-image algorithm is implemented to address the lack of training data for D-proteins by converting the chirality of L-receptors. Furthermore, we enhance D-Flow's capacity by integrating protein language models (PLMs) with structural awareness through a lightweight structural adapter that injects structural representations into PLM embeddings. This enables D-Flow to learn conformational priors in the D-peptide chemical space and to accommodate the chiral selectivity of binding sites, thereby mitigating the scarcity of D-peptide data. A two-stage training pipeline and a control toolkit enable D-Flow to transition from general protein design to targeted binder design while preserving pre-training knowledge. Results on the PepMerge benchmark show D-Flow's effectiveness. D-peptides generated by D-Flow align more closely with native sequences and structures, with sequence identity improving by 10. 2% over the best baseline and the top affinity score reaching 24. 31%. Overall, D-Flow shows potential for D-peptide design, facilitating the development of bioorthogonal and stable molecular tools and diagnostics. Code is available at https://github.com/smiles724/PeptideDesign.

TMLR Journal 2025 Journal Article

Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling

  • Fang Wu
  • Stan Z. Li

Protein-protein interaction (PPI) represents a central challenge within the biology field, and accurately predicting the consequences of mutations in this context is crucial for drug design and protein engineering. Deep learning (DL) has shown promise in forecasting the effects of such mutations but is hindered by two primary constraints. First, the structures of mutant proteins are often elusive to acquire. Secondly, PPI takes place dynamically, which is rarely integrated into the DL architecture design. To address these obstacles, we present a novel framework named Refine-PPI with two key enhancements. First, we introduce a structure refinement module trained by a mask mutation modeling (MMM) task on available wild-type structures, which is then transferred to hallucinate the inaccessible mutant structures. Second, we employ a new kind of geometric network, called the probability density cloud network (PDC-Net), to capture 3D dynamic variations and encode the atomic uncertainty associated with PPI. Comprehensive experiments on SKEMPI.v2 substantiate the superiority of Refine-PPI over all existing tools for predicting free energy change. These findings underscore the effectiveness of our hallucination strategy and the PDC module in addressing the absence of mutant protein structure and modeling geometric uncertainty.

AAAI Conference 2025 Conference Paper

Generalized Implicit Neural Representations for Dynamic Molecular Surface Modeling

  • Fang Wu
  • Bozhen Hu
  • Stan Z. Li

Molecular dynamics (MD) has long been the de facto choice for simulating intricate physical systems from first principles. Recent efforts utilize the implicit neural representation (INR) to directly learn surface point clouds' signed distance function (SDF) with promising outcomes. However, INR's temporal generalization to unexplored molecular systems remains limited, which poses a significant barrier to applying INR to a broader range of real-world scenarios. This study introduces MoE-DSR, an enhanced version of dynamic surface representations (DSR) that effectively integrates the mixture-of-experts (MoE) strategy. Specifically, the router employs a novel geometric surface cloud network to extract the structural information from the initial static protein conformation as the prior knowledge. Meanwhile, experts compromising a team of equivariant implicit neural networks (E-INNs), each responsible for distinct protein families, ensure precise SDF estimation across varied protein data landscapes. We showcase the ability of MoE-DSR to model dynamic protein surface shapes using ensembles from ATLAS, the largest available protein MD simulations database. Extensive experiments validate its effectiveness in analyzing complex molecular systems across continuous space and time domains.

NeurIPS Conference 2025 Conference Paper

Joint Design of Protein Surface and Backbone Using a Diffusion Bridge Model

  • Guanlue Li
  • Xufeng Zhao
  • Fang Wu
  • Sören Laue

Protein-protein interactions (PPIs) are governed by surface complementarity and hydrophobic interactions at protein interfaces. However, designing diverse and physically realistic protein structure and surfaces that precisely complement target receptors remains a significant challenge in computational protein design. In this work, we introduce PepBridge, a novel framework for the joint design of protein surface and structure that seamlessly integrates receptor surface geometry and biochemical properties. Starting with a receptor surface represented as a 3D point cloud, PepBridge generates complete protein structures through a multi-step process. First, it employs denoising diffusion bridge models (DDBMs) to map receptor surfaces to ligand surfaces. Next, a multi-model diffusion model predicts the corresponding structure, while Shape-Frame Matching Networks ensure alignment between surface geometry and backbone architecture. This integrated approach facilitates surface complementarity, conformational stability, and chemical feasibility. Extensive validation across diverse protein design scenarios demonstrates PepBridge's efficacy in generating structurally viable proteins, representing a significant advancement in the joint design of top-down protein structure.

IJCAI Conference 2024 Conference Paper

A Semi-supervised Molecular Learning Framework for Activity Cliff Estimation

  • Fang Wu

Machine learning (ML) enables accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Their success is based on the principle of similarity at its heart, assuming that similar molecules exhibit close properties. However, activity cliffs challenge this principle, and their presence leads to a sharp decline in the performance of existing ML algorithms, particularly graph-based methods. To overcome this obstacle under a low-data scenario, we propose a novel semi-supervised learning (SSL) method dubbed SemiMol, which employs predictions on numerous unannotated data as pseudo-signals for subsequent training. Specifically, we introduce an additional instructor model to evaluate the accuracy and trustworthiness of proxy labels because existing pseudo-labeling approaches require probabilistic outputs to reveal the model's confidence and fail to be applied in regression tasks. Moreover, we design a self-adaptive curriculum learning algorithm to progressively move the target model toward hard samples at a controllable pace. Extensive experiments on 30 activity cliff datasets demonstrate that SemiMol significantly enhances graph-based ML architectures and outpasses state-of-the-art pretraining and SSL baselines.

NeurIPS Conference 2024 Conference Paper

Instructor-inspired Machine Learning for Robust Molecular Property Prediction

  • Fang Wu
  • Shuting Jin
  • Siyuan Li
  • Stan Z. Li

Machine learning catalyzes a revolution in chemical and biological science. However, its efficacy is heavily dependent on the availability of labeled data, and annotating biochemical data is extremely laborious. To surmount this data sparsity challenge, we present an instructive learning algorithm named InstructMol to measure pseudo-labels' reliability and help the target model leverage large-scale unlabeled data. InstructMol does not require transferring knowledge between multiple domains, which avoids the potential gap between the pretraining and fine-tuning stages. We demonstrated the high accuracy of InstructMol on several real-world molecular datasets and out-of-distribution (OOD) benchmarks.

NeurIPS Conference 2023 Conference Paper

A Hierarchical Training Paradigm for Antibody Structure-sequence Co-design

  • Fang Wu
  • Stan Z. Li

Therapeutic antibodies are an essential and rapidly flourishing drug modality. The binding specificity between antibodies and antigens is decided by complementarity-determining regions (CDRs) at the tips of these Y-shaped proteins. In this paper, we propose a \textbf{h}ierarchical \textbf{t}raining \textbf{p}aradigm (HTP) for the antibody sequence-structure co-design. HTP consists of four levels of training stages, each corresponding to a specific protein modality within a particular protein domain. Through carefully crafted tasks in different stages, HTP seamlessly and effectively integrates geometric graph neural networks (GNNs) with large-scale protein language models to excavate evolutionary information from not only geometric structures but also vast antibody and non-antibody sequence databases, which determines ligand binding pose and strength. Empirical experiments show HTP sets the new state-of-the-art performance in the co-design problem as well as the fix-backbone design. Our research offers a hopeful path to unleash the potential of deep generative architectures and seeks to illuminate the way forward for the antibody sequence and structure co-design challenge.

AAAI Conference 2023 Conference Paper

DiffMD: A Geometric Diffusion Model for Molecular Dynamics Simulations

  • Fang Wu
  • Stan Z. Li

Molecular dynamics (MD) has long been the de facto choice for simulating complex atomistic systems from first principles. Recently deep learning models become a popular way to accelerate MD. Notwithstanding, existing models depend on intermediate variables such as the potential energy or force fields to update atomic positions, which requires additional computations to perform back-propagation. To waive this requirement, we propose a novel model called DiffMD by directly estimating the gradient of the log density of molecular conformations. DiffMD relies on a score-based denoising diffusion generative model that perturbs the molecular structure with a conditional noise depending on atomic accelerations and treats conformations at previous timeframes as the prior distribution for sampling. Another challenge of modeling such a conformation generation process is that a molecule is kinetic instead of static, which no prior works have strictly studied. To solve this challenge, we propose an equivariant geometric Transformer as the score function in the diffusion process to calculate corresponding gradients. It incorporates the directions and velocities of atomic motions via 3D spherical Fourier-Bessel representations. With multiple architectural improvements, we outperform state-of-the-art baselines on MD17 and isomers of C7O2H10 datasets. This work contributes to accelerating material and drug discovery.

AAAI Conference 2023 Conference Paper

Molformer: Motif-Based Transformer on 3D Heterogeneous Molecular Graphs

  • Fang Wu
  • Dragomir Radev
  • Stan Z. Li

Procuring expressive molecular representations underpins AI-driven molecule design and scientific discovery. The research mainly focuses on atom-level homogeneous molecular graphs, ignoring the rich information in subgraphs or motifs. However, it has been widely accepted that substructures play a dominant role in identifying and determining molecular properties. To address such issues, we formulate heterogeneous molecular graphs (HMGs) and introduce a novel architecture to exploit both molecular motifs and 3D geometry. Precisely, we extract functional groups as motifs for small molecules and employ reinforcement learning to adaptively select quaternary amino acids as motif candidates for proteins. Then HMGs are constructed with both atom-level and motif-level nodes. To better accommodate those HMGs, we introduce a variant of the Transformer named Molformer, which adopts a heterogeneous self-attention layer to distinguish the interactions between multi-level nodes. Besides, it is also coupled with a multi-scale mechanism to capture fine-grained local patterns with increasing contextual scales. An attentive farthest point sampling algorithm is also proposed to obtain the molecular representations. We validate Molformer across a broad range of domains, including quantum chemistry, physiology, and biophysics. Extensive experiments show that Molformer outperforms or achieves the comparable performance of several state-of-the-art baselines. Our work provides a promising way to utilize informative motifs from the perspective of multi-level graph construction. The code is available at https://github.com/smiles724/Molformer.

TIST Journal 2010 Journal Article

Opinion formation under costly expression

  • Fang Wu
  • Bernardo A. Huberman

Opinions play an important role in trust building and the creation of consensus about issues and products and a number of studies have focused on the design, evaluation, and utilization of online opinion systems. However, little effort has been spent on the dynamic aspects of online opinion formation. In this article, we study the dynamics of online opinion expression by analyzing the temporal evolution of vey large sets of user views and determine that in the course of time, later opinions tend to show a big difference with earlier opinions, which moderates the average opinion to the less extreme. Online posters also tend to disagree with previous opinions when the cost of expression is high.

STOC Conference 2007 Conference Paper

Proportional response dynamics leads to market equilibrium

  • Fang Wu
  • Li Zhang 0001

One of the main reasons of the recent success of peer to peer (P2P)file sharing systems such as BitTorrent is their built-in tit-for-tat mechanism. In this paper, we model the bandwidth allocation in a P2P system as an exchange economy and study a tit-for-tat dynamics, namely the proportional response dynamics, in this economy. In aproportional response dynamics each player distributes its good to its neighbors proportional to the utility it received from them in thelast period. We show that this dynamics not only converges but converges to a market equilibrium, a standard economic characterization of efficient exchanges in a competitive market. In addition, for some classes of utility functions we consider, it converges much faster than the classical tat process and any existingalgorithms for computing market equilibria. As a part of our proof we study the double normalization of a matrix, an operation that linearly scales the rows of a matrix sothat each row sums to a prescribed positive number, followed by a similar scaling of the columns. We show that the iterative double normalization process of any non-negative matrix always converges. This complements the previous studies in matrix scaling that has focused on the convergence condition of the process when the row and column normalizations are considered as separate steps.