Arrow Research search

Author name cluster

Yutong Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

AAAI Conference 2026 Conference Paper

FUSION: Dataset Pruning via Fusing Uncertainty with Structural Information for Optimal Neural Training in Crystal Property Prediction

  • Xiean Wang
  • Pin Chen
  • Liqin Tan
  • Yutong Lu
  • Qingsong Zou

The rapid expansion of materials databases offers unprecedented opportunities for accelerating materials discovery via machine learning. However, the widespread assumption that larger datasets inherently produce better models does not hold in practice. We propose FUSION (Fusing Uncertainty with Structural Information for Optimal Neural training), an offline dataset pruning strategy that synergistically combines uncertainty quantification with crystallographic structure analysis via geometric fingerprinting, framing dataset pruning as a discrete optimization problem. Through evaluation across 3 benchmark datasets, FUSION consistently outperforms baselines, including random pruning, uncertainty sampling, weighting factor pruning, diversity sampling, and active learning. It demonstrates robust transferability across 11 diverse architectures, outperforming random pruning by 1.91–13.65% across different datasets, with an average improvement of 6.36%. Moreover, our analysis suggests that different models exhibit varying robustness characteristics when faced with pruned training data, highlighting the importance of model selection tailored to dataset composition. We identify optimal pruning points where removing just 0–8% of training data improves model performance, yielding gains up to 12.67% in specific model–dataset combinations. These results establish a new paradigm for materials informatics that prioritizes data quality over quantity, offering a pathway toward more efficient and sustainable machine learning workflows in computational materials science.

IJCAI Conference 2025 Conference Paper

CFDONEval: A Comprehensive Evaluation of Operator-Learning Neural Network Models for Computational Fluid Dynamics

  • Menghan Liu
  • Jianhuan Cen
  • Ziyang Zhou
  • Haolong Fan
  • Hongji Li
  • Ping Wei
  • Guohang Peng
  • Changye He

In this paper, we introduce CFDONEval, a comprehensive evaluation of 12 operator-learning-based neural network (ON) models to simulate 7 benchmark fluid dynamics problems. These problems cover a range of 2D scenarios, including Darcy flow, two-phase flow, Taylor-Green vortex, lid-driven cavity flow, tube flow, circular cylinder flow, and 3D periodic hill flow. For a rigorous evaluation, we establish 22 fluid dynamics datasets for these benchmark problems, 18 of which are newly generated using traditional numerical methods, such as the finite element method. Our evaluation tackles 5 key challenges: multiscale phenomena, convection dominance, long-term predictions, multiphase flows, and unstructured meshes over complex geometries. We assess computational accuracy, efficiency, and flow field visualization, offering valuable insights into the application of ON models in fluid dynamics research. Our findings show that attention-based models perform well in handling almost all challenges; models with a U-shaped structure excel in handling multiscale problems; and the NU-FNO model demonstrates the smallest relative error in L2 norm when processing nonuniform grid data. The related code, dataset, and appendix are publicly available at: https: //github. com/Sysuzqs/CFDNNEval.

ICLR Conference 2025 Conference Paper

CL-MFAP: A Contrastive Learning-Based Multimodal Foundation Model for Molecular Property Prediction and Antibiotic Screening

  • Gen Zhou
  • Sugitha Janarthanan
  • Yutong Lu
  • Pingzhao Hu

Due to the rise in antimicrobial resistance, identifying novel compounds with antibiotic potential is crucial for combatting this global health issue. However, traditional drug development methods are costly and inefficient. Recognizing the pressing need for more effective solutions, researchers have turned to machine learning techniques to streamline the prediction and development of novel antibiotic compounds. While foundation models have shown promise in antibiotic discovery, current mainstream efforts still fall short of fully leveraging the potential of multimodal molecular data. Recent studies suggest that contrastive learning frameworks utilizing multimodal data exhibit excellent performance in representation learning across various domains. Building upon this, we introduce CL-MFAP, an unsupervised contrastive learning (CL)-based multimodal foundation (MF) model specifically tailored for discovering small molecules with potential antibiotic properties (AP) using three types of molecular data. This model employs 1.6 million bioactive molecules with drug-like properties from the ChEMBL dataset to jointly pretrain three encoders: (1) a transformer-based encoder with rotary position embedding for processing SMILES strings; (2) another transformer-based encoder, incorporating a novel bi-level routing attention mechanism to handle molecular graph representations; and (3) a Morgan fingerprint encoder using a multilayer perceptron, to achieve the contrastive learning purpose. The CL-MFAP outperforms baseline models in antibiotic property prediction by effectively utilizing different molecular modalities and demonstrates superior domain-specific performance when fine-tuned for antibiotic-related property prediction tasks.

ICLR Conference 2025 Conference Paper

ECD: A Machine Learning Benchmark for Predicting Enhanced-Precision Electronic Charge Density in Crystalline Inorganic Materials

  • Pin Chen
  • Zexin Xu
  • Qing Mo
  • Hongjin Zhong
  • Fengyang Xu
  • Yutong Lu

Supervised machine learning techniques are increasingly being adopted to speed up electronic structure predictions, serving as alternatives to first-principles methods like Density Functional Theory (DFT). Although current DFT datasets mainly emphasize chemical properties and atomic forces, the precise prediction of electronic charge density is essential for accurately determining a system's total energy and ground state properties. In this study, we introduce a novel electronic charge density dataset named ECD, which encompasses 140,646 stable crystal geometries with medium-precision Perdew–Burke–Ernzerhof (PBE) functional data. Within this dataset, a subset of 7,147 geometries includes high-precision electronic charge density data calculated using the Heyd–Scuseria–Ernzerhof (HSE) functional in DFT. By designing various benchmark tasks for crystalline materials and emphasizing training with large-scale PBE data while fine-tuning with a smaller subset of high-precision HSE data, we demonstrate the efficacy of current machine learning models in predicting electronic charge densities. The ECD dataset and baseline models are open-sourced to support community efforts in developing new methodologies and accelerating materials design and applications.

ICML Conference 2024 Conference Paper

Equivariant Diffusion for Crystal Structure Prediction

  • Peijia Lin
  • Pin Chen
  • Rui Jiao
  • Qing Mo
  • Jianhuan Cen
  • Wenbing Huang 0001
  • Yang Liu 0005
  • Dan Huang 0001

In addressing the challenge of Crystal Structure Prediction (CSP), symmetry-aware deep learning models, particularly diffusion models, have been extensively studied, which treat CSP as a conditional generation task. However, ensuring permutation, rotation, and periodic translation equivariance during diffusion process remains incompletely addressed. In this work, we propose EquiCSP, a novel equivariant diffusion-based generative model. We not only address the overlooked issue of lattice permutation equivariance in existing models, but also develop a unique noising algorithm that rigorously maintains periodic translation equivariance throughout both training and inference processes. Our experiments indicate that EquiCSP significantly surpasses existing models in terms of generating accurate structures and demonstrates faster convergence during the training process.

NeurIPS Conference 2024 Conference Paper

Learning Superconductivity from Ordered and Disordered Material Structures

  • Pin Chen
  • Luoxuan Peng
  • Rui Jiao
  • Qing Mo
  • Zhen Wang
  • Wenbing Huang
  • Yang Liu
  • Yutong Lu

Superconductivity is a fascinating phenomenon observed in certain materials under certain conditions. However, some critical aspects of it, such as the relationship between superconductivity and materials' chemical/structural features, still need to be understood. Recent successes of data-driven approaches in material science strongly inspire researchers to study this relationship with them, but a corresponding dataset is still lacking. Hence, we present a new dataset for data-driven approaches, namely SuperCon3D, containing both 3D crystal structures and experimental superconducting transition temperature (Tc) for the first time. Based on SuperCon3D, we propose two deep learning methods for designing high Tc superconductors. The first is SODNet, a novel equivariant graph attention model for screening known structures, which differs from existing models in incorporating both ordered and disordered geometric content. The second is a diffusion generative model DiffCSP-SC for creating new structures, which enables high Tc-targeted generation. Extensive experiments demonstrate that both our proposed dataset and models are advantageous for designing new high Tc superconducting candidates.

NeurIPS Conference 2023 Conference Paper

Crystal Structure Prediction by Joint Equivariant Diffusion

  • Rui Jiao
  • Wenbing Huang
  • Peijia Lin
  • Jiaqi Han
  • Pin Chen
  • Yutong Lu
  • Yang Liu

Crystal Structure Prediction (CSP) is crucial in various scientific disciplines. While CSP can be addressed by employing currently-prevailing generative models ( e. g. diffusion models), this task encounters unique challenges owing to the symmetric geometry of crystal structures---the invariance of translation, rotation, and periodicity. To incorporate the above symmetries, this paper proposes DiffCSP, a novel diffusion model to learn the structure distribution from stable crystals. To be specific, DiffCSP jointly generates the lattice and atom coordinates for each crystal by employing a periodic-E(3)-equivariant denoising model, to better model the crystal geometry. Notably, different from related equivariant generative approaches, DiffCSP leverages fractional coordinates other than Cartesian coordinates to represent crystals, remarkably promoting the diffusion and the generation process of atom positions. Extensive experiments verify that our DiffCSP remarkably outperforms existing CSP methods, with a much lower computation cost in contrast to DFT-based methods. Moreover, the superiority of DiffCSP is still observed when it is extended for ab initio crystal generation.

AAAI Conference 2021 Conference Paper

Multi-Layer Networks for Ensemble Precipitation Forecasts Postprocessing

  • Fengyang Xu
  • Guanbin Li
  • Yunfei Du
  • Zhiguang Chen
  • Yutong Lu

The postprocessing method of ensemble forecasts is usually used to find a more precise estimate of future precipitation, because dynamic meteorology models have limitations in fitting fine-grained atmospheric processes and precipitation is driven more often by smaller-scale processes, while ensemble forecasts can hit this precipitation at times. However, the pattern of these hits cannot be easily summarized. The existing objective postprocessing methods tend to extend the rain area or false alarm the precipitation intensity categories. In this work, we introduce a multi-layer structure to simultaneously reduce the bias in forecast ensembles output by meteorology models and merge them to a quality deterministic (single-valued) forecast using cross-grid information, which differs quite dramatically from the previous statistical postprocessing method. The multi-layer network is designed to model the spatial distribution of future precipitation of different intensity categories (IC-MLNet). We provide a comparison of IC-MLNet to simple average as well as another two state-of-the-art ensemble quantitative precipitation forecasts (QPFs) postprocessing approaches over both single-model and multi-model ensemble forecasts datasets from TIGGE. The experimental results indicate that our model achieves superior performance over the compared baselines in precipitation amount prediction as well as precipitation intensities categories prediction.

IJCAI Conference 2020 Conference Paper

Communicative Representation Learning on Attributed Molecular Graphs

  • Ying Song
  • Shuangjia Zheng
  • Zhangming Niu
  • Zhang-Hua Fu
  • Yutong Lu
  • Yuedong Yang

Constructing proper representations of molecules lies at the core of numerous tasks such as molecular property prediction and drug design. Graph neural networks, especially message passing neural network (MPNN) and its variants, have recently made remarkable achievements in molecular graph modeling. Albeit powerful, the one-sided focuses on atom (node) or bond (edge) information of existing MPNN methods lead to the insufficient representations of the attributed molecular graphs. Herein, we propose a Communicative Message Passing Neural Network (CMPNN) to improve the molecular embedding by strengthening the message interactions between nodes and edges through a communicative kernel. In addition, the message generation process is enriched by introducing a new message booster module. Extensive experiments demonstrated that the proposed model obtained superior performances against state-of-the-art baselines on six chemical property datasets. Further visualization also showed better representation capacity of our model.

IJCAI Conference 2020 Conference Paper

Phishing Scam Detection on Ethereum: Towards Financial Security for Blockchain Ecosystem

  • Weili Chen
  • Xiongfeng Guo
  • Zhiguang Chen
  • Zibin Zheng
  • Yutong Lu

In recent years, blockchain technology has created a new cryptocurrency world and has attracted a lot of attention. It also is rampant with various scams. For example, phishing scams have grabbed a lot of money and has become an important threat to users' financial security in the blockchain ecosystem. To help deal with this issue, this paper proposes a systematic approach to detect phishing accounts based on blockchain transactions and take Ethereum as an example to verify its effectiveness. Specifically, we propose a graph-based cascade feature extraction method based on transaction records and a lightGBM-based Dual-sampling Ensemble algorithm to build the identification model. Extensive experiments show that the proposed algorithm can effectively identify phishing scams.