Arrow Research search

Author name cluster

Shuiwang Ji

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

78 papers
2 author rows

Possible papers

78

TMLR Journal 2026 Journal Article

Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials

  • Cong Fu
  • Yuchao Lin
  • Zachary Krueger
  • Haiyang Yu
  • Maho Nakata
  • Jianwen Xie
  • Emine Kucukbenli
  • Xiaofeng Qian

Accurate molecular property predictions require 3D geometries, which are typically obtained using expensive methods such as density functional theory (DFT). Here, we attempt to obtain molecular geometries by relying solely on machine learning interatomic potential (MLIP) models. To this end, we first curate a large-scale molecular relaxation dataset comprising 3.5 million molecules and 300 million snapshots. Then MLIP pre-trained models are trained with supervised learning to predict energy and forces given 3D molecular structures. Once trained, we show that the pre-trained models can be used in different ways to obtain geometries either explicitly or implicitly. First, it can be used to obtain approximate low-energy 3D geometries via geometry optimization. While these geometries do not consistently reach DFT-level chemical accuracy or convergence, they can still improve downstream performance compared to non-relaxed structures. To mitigate potential biases and enhance downstream predictions, we introduce geometry fine-tuning based on the relaxed 3D geometries. Second, the pre-trained models can be directly fine-tuned for property prediction when ground truth 3D geometries are available. Our results demonstrate that MLIP pre-trained models trained on relaxation data can learn transferable molecular representations to improve downstream molecular property prediction and can provide practically valuable but approximate molecular geometries that benefit property predictions. Our code is publicly available at: https://github.com/divelab/AIRS/.

TMLR Journal 2025 Journal Article

Counterfactual Fairness on Graphs: Augmentations, Hidden Confounders, and Identifiability

  • Hongyi Ling
  • Zhimeng Jiang
  • Na Zou
  • Shuiwang Ji

We consider augmenting graph data with counterfactual generation in order to achieve fairness on downstream tasks. While this direction has been explored previously, existing methods invariably consider oversimplified causal relationships. Moreover, they often rely on unidentifiable models to encode causal relationships, making it hard to identify the true joint distribution and thus recover counterfactual graphs. To tackle these challenges, we introduce a causal model with hidden confounders on graphs, which considers the existence of hidden confounders affecting both node features and graph structures. We use an identifiable graph VAE model to simultaneously estimate hidden confounders and learn generation functions of the causal model. By incorporating a Gaussian mixture prior distribution, we improve the identifiability of our model to recover the joint distribution of observed data and hidden confounders. Using the generated counterfactual graphs, we enforce consistency in the predictions of classifiers for different counterfactual graphs, thereby achieving graph counterfactual fairness in these classifiers. Experimental results demonstrate the effectiveness of our method in improving the counterfactual fairness of classifiers on various graph tasks. Moreover, theoretical analysis, coupled with empirical results, illustrates the capability of our method to successfully identify hidden confounders.

NeurIPS Conference 2025 Conference Paper

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-based Decoding

  • Xiner Li
  • Yulai Zhao
  • Chenyu Wang
  • Gabriele Scalia
  • Gokcen Eraslan
  • Surag Nair
  • Tommaso Biancalani
  • Shuiwang Ji

Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require differentiable proxy models (e. g. , classifier guidance or DPS) or involve computationally expensive fine-tuning of diffusion models (e. g. , classifier-free guidance, RL-based fine-tuning). In our work, we propose a new method to address these challenges. Our algorithm is an iterative sampling method that integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future, into the standard inference procedure of pre-trained diffusion models. Notably, our approach avoids fine-tuning generative models and eliminates the need to construct differentiable models. This enables us to (1) directly utilize non-differentiable features/reward feedback, commonly used in many scientific domains, and (2) apply our method to recent discrete diffusion models in a principled way. Finally, we demonstrate the effectiveness of our algorithm across several domains, including image generation, molecule generation, and DNA/RNA sequence generation.

ICML Conference 2025 Conference Paper

DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra

  • Montgomery Bohde
  • Mrunali Manjrekar
  • Runzhong Wang
  • Shuiwang Ji
  • Connor W. Coley

Mass spectrometry plays a fundamental role in elucidating the structures of unknown molecules and subsequent scientific discoveries. One formulation of the structure elucidation task is the conditional de novo generation of molecular structure given a mass spectrum. Toward a more accurate and efficient scientific discovery pipeline for small molecules, we present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task. The encoder utilizes a transformer architecture and models mass spectra domain knowledge such as peak formulae and neutral losses, and the decoder is a discrete graph diffusion model restricted by the heavy-atom composition of a known chemical formula. To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs, which are available in virtually infinite quantities, compared to structure-spectrum pairs that number in the tens of thousands. Extensive experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation. We provide several ablations to demonstrate the effectiveness of our diffusion and pretraining approaches and show consistent performance scaling with increasing pretraining dataset size. DiffMS code is publicly available at https: //github. com/coleygroup/DiffMS.

ICML Conference 2025 Conference Paper

Discovering Physics Laws of Dynamical Systems via Invariant Function Learning

  • Shurui Gui
  • Xiner Li
  • Shuiwang Ji

We consider learning underlying laws of dynamical systems governed by ordinary differential equations (ODE). A key challenge is how to discover intrinsic dynamics across multiple environments while circumventing environment-specific mechanisms. Unlike prior work, we tackle more complex environments where changes extend beyond function coefficients to entirely different function forms. For example, we demonstrate the discovery of ideal pendulum’s natural motion $\alpha^2 \sin{\theta_t}$ by observing pendulum dynamics in different environments, such as the damped environment $\alpha^2 \sin(\theta_t) - \rho \omega_t$ and powered environment $\alpha^2 \sin(\theta_t) + \rho \frac{\omega_t}{\left|\omega_t\right|}$. Here, we formulate this problem as an invariant function learning task and propose a new method, known as D isentanglement of I nvariant F unctions (DIF), that is grounded in causal analysis. We propose a causal graph and design an encoder-decoder hypernetwork that explicitly disentangles invariant functions from environment-specific dynamics. The discovery of invariant functions is guaranteed by our information-based principle that enforces the independence between extracted invariant functions and environments. Quantitative comparisons with meta-learning and invariant learning baselines on three ODE systems demonstrate the effectiveness and efficiency of our method. Furthermore, symbolic regression explanation results highlight the ability of our framework to uncover intrinsic laws.

ICLR Conference 2025 Conference Paper

Eliminating Position Bias of Language Models: A Mechanistic Approach

  • Ziqi Wang 0003
  • Hanlin Zhang 0002
  • Xiner Li
  • Kuan-Hao Huang
  • Chi Han
  • Shuiwang Ji
  • Sham M. Kakade
  • Hao Peng 0009

Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. A simple mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and position embedding. Based on the analyses, we propose to **eliminate** position bias (e.g., different retrieved documents' orders in QA affect performance) with a **training-free zero-shot** approach. Our method changes the causal attention to bidirectional attention between documents and utilizes model attention values to decide the relative orders of documents instead of using the order provided in input prompts, therefore enabling Position-INvariant inferencE (PINE) at the document level. By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning. Notably, PINE is especially useful when adapting LMs for evaluating reasoning pairs: it consistently provides $8$ to $10$ percentage points performance gains, making Llama-3-70B-Instruct perform even better than GPT-4-0125-preview and GPT-4o-2024-08-06 on the RewardBench reasoning set.

ICLR Conference 2025 Conference Paper

Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models

  • Cong Fu 0003
  • Xiner Li
  • Blake Olson
  • Heng Ji 0001
  • Shuiwang Ji

Structure-based drug design (SBDD) is crucial for developing specific and effective therapeutics against protein targets but remains challenging due to complex protein-ligand interactions and vast chemical space. Although language models (LMs) have excelled in natural language processing, their application in SBDD is underexplored. To bridge this gap, we introduce a method, known as Frag2Seq, to apply LMs to SBDD by generating molecules in a fragment-based manner in which fragments correspond to functional modules. We transform 3D molecules into fragment-informed sequences using $SE(3)$-equivariant molecule and fragment local frames, extracting $SE(3)$-invariant sequences that preserve geometric information of 3D fragments. Furthermore, we incorporate protein pocket embeddings obtained from a pre-trained inverse folding model into the LMs via cross-attention to capture protein-ligand interaction, enabling effective target-aware molecule generation. Benefiting from employing LMs with fragment-based generation and effective protein context encoding, our model achieves the best performance on binding vina score and chemical properties such as QED and Lipinski, which shows our model’s efficacy in generating drug-like ligands with higher binding affinity against target proteins. Moreover, our method also exhibits higher sampling efficiency compared to atom-based autoregressive and diffusion baselines with at most $\times 300$ speedup. The code will be made publicly available at https://github.com/divelab/AIRS/tree/main/OpenMI/Frag2Seq.

ICML Conference 2025 Conference Paper

Geometry Informed Tokenization of Molecules for Language Model Generation

  • Xiner Li
  • Limei Wang
  • Youzhi Luo
  • Carl Edwards
  • Shurui Gui
  • Yuchao Lin
  • Heng Ji 0001
  • Shuiwang Ji

We consider molecule generation in 3D space using language models (LMs), which requires discrete tokenization of 3D molecular geometries. Although tokenization of molecular graphs exists, that for 3D geometries is largely unexplored. Here, we attempt to bridge this gap by proposing a novel method which converts molecular geometries into SE(3)-invariant 1D discrete sequences. Our method consists of canonical labeling and invariant spherical representation steps, which together maintain geometric and atomic fidelity in a format conducive to LMs. Our experiments show that, when coupled with our proposed method, various LMs excel in molecular geometry generation, especially in controlled generation tasks. Our code has been released as part of the AIRS library (https: //github. com/divelab/AIRS/).

TMLR Journal 2025 Journal Article

Hierarchical Language Model Design For Interpretable Graph Reasoning

  • Sambhav Khurana
  • Xiner Li
  • Shurui Gui
  • Shuiwang Ji

Large language models (LLMs) are being increasingly explored for graph tasks. Despite their remarkable success in text-based tasks, LLMs' capabilities in understanding explicit graph structures remain limited, particularly with large graphs. In this work, we introduce Hierarchical Language Model for Graph (HLM-G), which employs a two-block architecture to capture node-centric local information and interaction-centric global structure, effectively enhancing graph structure understanding abilities. The proposed scheme allows LLMs to address various graph queries with high efficacy, efficiency, and robustness, while reducing computational costs on large-scale graph tasks. Furthermore, we demonstrate the interpretability of our model using intrinsic attention weights and established explainers. Comprehensive evaluations across diverse graph reasoning and real-world tasks of node, link, and graph-levels highlight the superiority of our method, marking a significant advancement in the application of LLMs to graph understanding.

TMLR Journal 2025 Journal Article

Language Models for Controllable DNA Sequence Design

  • Xingyu Su
  • Xiner Li
  • Yuchao Lin
  • Ziqian Xie
  • Degui Zhi
  • Shuiwang Ji

We consider controllable DNA sequence design, where sequences are generated by conditioning on specific biological properties. While language models (LMs) such as GPT and BERT have achieved remarkable success in natural language generation, their application to DNA sequence generation remains largely underexplored. In this work, we introduce ATGC-Gen, an Automated Transformer Generator for Controllable Generation, which leverages cross-modal encoding to integrate diverse biological signals. ATGC-Gen is instantiated with both decoder-only and encoder-only transformer architectures, allowing flexible training and generation under either autoregressive or masked recovery objectives. We evaluate ATGC-Gen on representative tasks including promoter and enhancer sequence design, and further introduce a new dataset based on ChIP-Seq experiments for modeling protein binding specificity. Our experiments demonstrate that ATGC-Gen can generate fluent, diverse, and biologically relevant sequences aligned with the desired properties. Compared to prior methods, our model achieves notable improvements in controllability and functional relevance, highlighting the potential of language models in advancing programmable genomic design.

ICLR Conference 2025 Conference Paper

Learning to Discover Regulatory Elements for Gene Expression Prediction

  • Xingyu Su
  • Haiyang Yu 0005
  • Degui Zhi
  • Shuiwang Ji

We consider the problem of predicting gene expressions from DNA sequences. A key challenge of this task is to find the regulatory elements that control gene expressions. Here, we introduce Seq2Exp, a Sequence to Expression network explicitly designed to discover and extract regulatory elements that drive target gene expression, enhancing the accuracy of the gene expression prediction. Our approach captures the causal relationship between epigenomic signals, DNA sequences and their associated regulatory elements. Specifically, we propose to decompose the epigenomic signals and the DNA sequence conditioned on the causal active regulatory elements, and apply an information bottleneck with the Beta distribution to combine their effects while filtering out non-causal components. Our experiments demonstrate that Seq2Exp outperforms existing baselines in gene expression prediction tasks and discovers influential regions compared to commonly used statistical methods for peak detection such as MACS3. The source code is released as part of the AIRS library (https://github.com/divelab/AIRS/).

NeurIPS Conference 2025 Conference Paper

ML4CFD Competition: Results and Retrospective Analysis

  • Mouadh Yagoubi
  • David Danan
  • Milad LEYLI ABADI
  • Jocelyn Mazari
  • Jean-Patrick Brunet
  • Abbas Kabalan
  • Fabien Casenave
  • Yuxin Ma

The integration of machine learning (ML) into the physical sciences is reshaping computational paradigms, offering the potential to accelerate demanding simulations such as computational fluid dynamics (CFD). Yet, persistent challenges in accuracy, generalization, and physical consistency hinder the practical deployment of ML models in scientific domains. To address these limitations and systematically benchmark progress, we organized the ML4CFD competition, centered on surrogate modeling for aerodynamic simulations over two-dimensional airfoils. The competition attracted over 240 teams, who were provided with a curated dataset generated via OpenFOAM and evaluated through a multi-criteria framework encompassing predictive accuracy, physical fidelity, computational efficiency, and out-of-distribution generalization. This retrospective analysis reviews the competition outcomes, highlighting several approaches that outperformed baselines under our global evaluation score. Notably, the top entry exceeded the performance of the original OpenFOAM solver on aggregate metrics, illustrating the promise of ML based surrogates to outperform traditional solvers under tailored criteria. However, this does not imply that the winning solution could replace the OpenFOAM solver or that it was overall superior, even for this specific task. Drawing from these results, we analyze the key design principles of top submissions, assess the robustness of our evaluation framework, and offer guidance for future scientific ML challenges.

ICML Conference 2025 Conference Paper

On Explaining Equivariant Graph Networks via Improved Relevance Propagation

  • Hongyi Ling
  • Haiyang Yu 0005
  • Zhimeng Jiang
  • Na Zou 0001
  • Shuiwang Ji

We consider explainability in equivariant graph neural networks for 3D geometric graphs. While many XAI methods have been developed for analyzing graph neural networks, they predominantly target 2D graph structures. The complex nature of 3D data and the sophisticated architectures of equivariant GNNs present unique challenges. Current XAI techniques either struggle to adapt to equivariant GNNs or fail to effectively handle positional data and evaluate the significance of geometric features adequately. To address these challenges, we introduce a novel method, known as EquiGX, which uses the Deep Taylor decomposition framework to extend the layer-wise relevance propagation rules tailored for spherical equivariant GNNs. Our approach decomposes prediction scores and back-propagates the relevance scores through each layer to the input space. Our decomposition rules provide a detailed explanation of each layer’s contribution to the network’s predictions, thereby enhancing our understanding of how geometric and positional data influence the model’s outputs. Through experiments on both synthetic and real-world datasets, our method demonstrates its capability to identify critical geometric structures and outperform alternative baselines. These results indicate that our method provides significantly enhanced explanations for equivariant GNNs. Our code has been released as part of the AIRS library (https: //github. com/divelab/AIRS/).

ICML Conference 2025 Conference Paper

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

  • Masatoshi Uehara
  • Xingyu Su
  • Yulai Zhao 0002
  • Xiner Li
  • Aviv Regev
  • Shuiwang Ji
  • Sergey Levine
  • Tommaso Biancalani

To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for inference-time reward optimization with diffusion models. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising. This sequential refinement allows for the gradual correction of errors introduced during reward optimization. Finally, we provide a theoretical guarantee for our framework. Finally, we demonstrate its superior empirical performance in protein and DNA design.

NeurIPS Conference 2025 Conference Paper

Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations

  • Yuchao Lin
  • Cong Fu
  • Zachary Krueger
  • Haiyang Yu
  • Maho Nakata
  • Jianwen Xie
  • Emine Kucukbenli
  • Xiaofeng Qian

SO(3)-equivariant networks are the dominant models for machine learning interatomic potentials (MLIPs). The key operation of such networks is the Clebsch-Gordan (CG) tensor product, which is computationally expensive. To accelerate the computation, we develop tensor decomposition networks (TDNs) as a class of approximately equivariant networks whose CG tensor products are replaced by low-rank tensor decompositions, such as the CANDECOMP/PARAFAC (CP) decomposition. With the CP decomposition, we prove (i) a uniform bound on the induced error of SO(3)-equivariance, and (ii) the universality of approximating any equivariant bilinear map. To further reduce the number of parameters, we propose path-weight sharing that ties all multiplicity-space weights across the O(L^3) CG paths into a single shared parameter set without compromising equivariance, where L is the maximum angular degree. The resulting layer acts as a plug-and-play replacement for tensor products in existing networks, and the computational complexity of tensor products is reduced from O(L^6) to O(L^4). We evaluate TDNs on PubChemQCR, a newly curated molecular relaxation dataset containing 105 million DFT-calculated snapshots. We also use existing datasets, including OC20, and OC22. Results show that TDNs achieve competitive performance with dramatic speedup in computations. Our code is publicly available as part of the AIRS library (https: //github. com/divelab/AIRS).

NeurIPS Conference 2025 Conference Paper

Towards precision protein-ligand affinity prediction benchmark: A Complete and Modification-Aware DAVIS Dataset

  • Ming Hsiu Wu
  • Ziqian Xie
  • Shuiwang Ji
  • Degui Zhi

Advancements in AI for science unlocks capabilities for critical drug discovery tasks such as protein-ligand binding affinity prediction. However, current models overfit to existing oversimplified datasets that does not represent naturally occurring and biologically relevant proteins with modifications. In this work, we curate a complete and modification-aware version of the widely used DAVIS dataset by incorporating 4, 032 kinase–ligand pairs involving substitutions, insertions, deletions, and phosphorylation events. This enriched dataset enables benchmarking of predictive models under biologically realistic conditions. Based on this new dataset, we propose three benchmark settings—Augmented Dataset Prediction, Wild-Type to Modification Generalization, and Few-Shot Modification Generalization—designed to assess model robustness in the presence of protein modifications. Through extensive evaluation of both docking-free and docking-based methods, we find that docking-based model generalize better in zero-shot settings. In contrast, docking-free models tend to overfit to wild-type proteins and struggle with unseen modifications but show notable improvement when fine-tuned on a small set of modified examples. We anticipate that the curated dataset and benchmarks offer a valuable foundation for developing models that better generalize to protein modifications, ultimately advancing precision medicine in drug discovery. The benchmark is available at: https: //github. com/ZhiGroup/DAVIS-complete

ICML Conference 2024 Conference Paper

A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction

  • Keqiang Yan
  • Alexandra Saxton
  • Xiaofeng Qian
  • Xiaoning Qian
  • Shuiwang Ji

We consider the prediction of general tensor properties of crystalline materials, including dielectric, piezoelectric, and elastic tensors. A key challenge here is how to make the predictions satisfy the unique tensor equivariance to both O(3) and crystal space groups. To this end, we propose a General Materials Tensor Network (GMTNet), which is carefully designed to satisfy the required symmetries. To evaluate our method, we curate a dataset and establish evaluation metrics that are tailored to the intricacies of crystal tensor predictions. Experimental results show that our GMTNet not only achieves promising performance on crystal tensors of various orders but also generates predictions fully consistent with the intrinsic crystal symmetries. Our code is publicly available as part of the AIRS library (https: //github. com/divelab/AIRS).

ICLR Conference 2024 Conference Paper

Active Test-Time Adaptation: Theoretical Analyses and An Algorithm

  • Shurui Gui
  • Xiner Li
  • Shuiwang Ji

Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings. Currently, most TTA methods can only deal with minor shifts and rely heavily on heuristic and empirical studies. To advance TTA under domain shifts, we propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting. We provide a learning theory analysis, demonstrating that incorporating limited labeled test instances enhances overall performances across test domains with a theoretical guarantee. We also present a sample entropy balancing for implementing ATTA while avoiding catastrophic forgetting (CF). We introduce a simple yet effective ATTA algorithm, known as SimATTA, using real-time sample selection techniques. Extensive experimental results confirm consistency with our theoretical analyses and show that the proposed ATTA method yields substantial performance improvements over TTA methods while maintaining efficiency and shares similar effectiveness to the more demanding active domain adaptation (ADA) methods. Our code is available at https://github.com/divelab/ATTA.

ICLR Conference 2024 Conference Paper

Complete and Efficient Graph Transformers for Crystal Material Property Prediction

  • Keqiang Yan
  • Cong Fu 0003
  • Xiaofeng Qian
  • Xiaoning Qian
  • Shuiwang Ji

Crystal structures are characterized by atomic bases within a primitive unit cell that repeats along a regular lattice throughout 3D space. The periodic and infinite nature of crystals poses unique challenges for geometric graph representation learning. Specifically, constructing graphs that effectively capture the complete geometric information of crystals and handle chiral crystals remains an unsolved and challenging problem. In this paper, we introduce a novel approach that utilizes the periodic patterns of unit cells to establish the lattice-based representation for each atom, enabling efficient and expressive graph representations of crystals. Furthermore, we propose ComFormer, a SE(3) transformer designed specifically for crystalline materials. ComFormer includes two variants; namely, iComFormer that employs invariant geometric descriptors of Euclidean distances and angles, and eComFormer that utilizes equivariant vector representations. Experimental results demonstrate the state-of-the-art predictive accuracy of ComFormer variants on various tasks across three widely-used crystal benchmarks. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).

TMLR Journal 2024 Journal Article

Empowering GNNs via Edge-Aware Weisfeiler-Leman Algorithm

  • Meng Liu
  • Haiyang Yu
  • Shuiwang Ji

Message passing graph neural networks (GNNs) are known to have their expressiveness upper-bounded by 1-dimensional Weisfeiler-Leman (1-WL) algorithm. To achieve more powerful GNNs, existing attempts either require \emph{ad hoc} features, or involve operations that incur high time and space complexities. In this work, we propose a \textit{general} and \textit{provably powerful} GNN framework that preserves the \textit{scalability} of the message passing scheme. In particular, we first propose to empower 1-WL for graph isomorphism test by considering edges among neighbors, giving rise to NC-1-WL. The expressiveness of NC-1-WL is shown to be strictly above 1-WL and below 3-WL theoretically. Further, we propose the NC-GNN framework as a differentiable neural version of NC-1-WL. Our simple implementation of NC-GNN is provably as powerful as NC-1-WL. Experiments demonstrate that our NC-GNN performs effectively and efficiently on various benchmarks.

ICML Conference 2024 Conference Paper

Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency

  • Yuchao Lin
  • Jacob Helwig
  • Shurui Gui
  • Shuiwang Ji

We consider achieving equivariance in machine learning systems via frame averaging. Current frame averaging methods involve a costly sum over large frames or rely on sampling-based approaches that only yield approximate equivariance. Here, we propose Minimal Frame Averaging (MFA), a mathematical framework for constructing provably minimal frames that are exactly equivariant. The general foundations of MFA also allow us to extend frame averaging to more groups than previously considered, including the Lorentz group for describing symmetries in space-time, and the unitary group for complex-valued domains. Results demonstrate the efficiency and effectiveness of encoding symmetries via MFA across a diverse range of tasks, including $n$-body simulation, top tagging in collider physics, and relaxed energy prediction. Our code is available at https: //github. com/divelab/MFA.

TMLR Journal 2024 Journal Article

Equivariant Graph Network Approximations of High-Degree Polynomials for Force Field Prediction

  • Zhao Xu
  • Haiyang Yu
  • Montgomery Bohde
  • Shuiwang Ji

Recent advancements in equivariant deep models have shown promise in accurately predicting atomic potentials and force fields in molecular dynamics simulations. Using spherical harmonics (SH) and tensor products (TP), these equivariant networks gain enhanced physical understanding, like symmetries and many-body interactions. Beyond encoding physical insights, SH and TP are also crucial to represent equivariant polynomial functions. In this work, we analyze the equivariant polynomial functions for the equivariant architecture, and introduce a novel equivariant network, named PACE. The proposed PACE utilizes edge booster and the Atomic Cluster Expansion (ACE) technique to approximate a greater number of $SE(3) \times S_n$ equivariant polynomial functions with enhanced degrees. As experimented in commonly used benchmarks, PACE demonstrates state-of-the-art performance in predicting atomic energy and force fields, with robust generalization capability across various geometric distributions under molecular dynamics (MD) across different temperature conditions. Our code is publicly available as part of the AIRS library \url{https://github.com/divelab/AIRS/}.

TMLR Journal 2024 Journal Article

Genetic InfoMax: Exploring Mutual Information Maximization in High-Dimensional Imaging Genetics Studies

  • Yaochen Xie
  • Ziqian Xie
  • Sheikh Muhammad Saiful Islam
  • Degui Zhi
  • Shuiwang Ji

Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits. When applied to high-dimensional medical imaging data, a key step is to extract lower-dimensional, yet informative representations of the data as traits. Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS in comparison to typical visual representation learning. In this study, we tackle this problem from the mutual information (MI) perspective by identifying key limitations of existing methods. We introduce a trans-modal learning framework Genetic InfoMax (GIM), including a regularized MI estimator and a novel genetics-informed transformer to address the specific challenges of GWAS. We evaluate GIM on human brain 3D MRI data and establish standardized evaluation protocols to compare it to existing approaches. Our results demonstrate the effectiveness of GIM and a significantly improved performance on GWAS.

ICML Conference 2024 Conference Paper

Graph Structure Extrapolation for Out-of-Distribution Generalization

  • Xiner Li
  • Shurui Gui
  • Youzhi Luo
  • Shuiwang Ji

Out-of-distribution (OOD) generalization deals with the prevalent learning scenario where test distribution shifts from training distribution. With rising application demands and inherent complexity, graph OOD problems call for specialized solutions. While data-centric methods exhibit performance enhancements on many generic machine learning tasks, there is a notable absence of data augmentation methods tailored for graph OOD generalization. In this work, we propose to achieve graph OOD generalization with the novel design of non-Euclidean-space linear extrapolation. The proposed augmentation strategy extrapolates structure spaces to generate OOD graph data. Our design tailors OOD samples for specific shifts without corrupting underlying causal mechanisms. Theoretical analysis and empirical results evidence the effectiveness of our method in solving target shifts, showing substantial and constant improvements across various graph OOD tasks.

NeurIPS Conference 2024 Conference Paper

Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

  • Keqiang Yan
  • Xiner Li
  • Hongyi Ling
  • Kenna Ashen
  • Carl Edwards
  • Raymundo Arróyave
  • Marinka Zitnik
  • Heng Ji

We consider the problem of crystal materials generation using language models (LMs). A key step is to convert 3D crystal structures into 1D sequences to be processed by LMs. Prior studies used the crystallographic information framework (CIF) file stream, which fails to ensure SE(3) and periodic invariance and may not lead to unique sequence representations for a given crystal structure. Here, we propose a novel method, known as Mat2Seq, to tackle this challenge. Mat2Seq converts 3D crystal structures into 1D sequences and ensures that different mathematical descriptions of the same crystal are represented in a single unique sequence, thereby provably achieving SE(3) and periodic invariance. Experimental results show that, with language models, Mat2Seq achieves promising performance in crystal structure generation as compared with prior methods.

ICLR Conference 2024 Conference Paper

On the Markov Property of Neural Algorithmic Reasoning: Analyses and Methods

  • Montgomery Bohde
  • Meng Liu 0015
  • Alexandra Saxton
  • Shuiwang Ji

Neural algorithmic reasoning is an emerging research direction that endows neural networks with the ability to mimic algorithmic executions step-by-step. A common paradigm in existing designs involves the use of historical embeddings in predicting the results of future execution steps. Our observation in this work is that such historical dependence intrinsically contradicts the Markov nature of algorithmic reasoning tasks. Based on this motivation, we present our ForgetNet, which does not use historical embeddings and thus is consistent with the Markov nature of the tasks. To address challenges in training ForgetNet at early stages, we further introduce G-ForgetNet, which uses a gating mechanism to allow for the selective integration of historical embeddings. Such an enhanced capability provides valuable computational pathways during the model's early training phase. Our extensive experiments, based on the CLRS-30 algorithmic reasoning benchmark, demonstrate that both ForgetNet and G-ForgetNet achieve better generalization capability than existing methods. Furthermore, we investigate the behavior of the gating mechanism, highlighting its degree of alignment with our intuitions and its effectiveness for robust performance. Our code is publicly available at https://github.com/divelab/ForgetNet.

ICML Conference 2024 Conference Paper

Position: TrustLLM: Trustworthiness in Large Language Models

  • Yue Huang 0001
  • Lichao Sun 0001
  • Haoran Wang 0005
  • Siyuan Wu 0001
  • Qihui Zhang
  • Yuan Li
  • Chujie Gao
  • Yixin Huang

Large language models (LLMs) have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and capability (i. e. , functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones, suggesting that open-source models can achieve high levels of trustworthiness without additional mechanisms like moderator, offering valuable insights for developers in this field. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Besides these observations, we’ve uncovered key insights into the multifaceted trustworthiness in LLMs. We emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. We advocate that the establishment of an AI alliance between industry, academia, the open-source community to foster collaboration is imperative to advance the trustworthiness of LLMs.

ICLR Conference 2024 Conference Paper

SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations

  • Xuan Zhang
  • Jacob Helwig
  • Yuchao Lin
  • Yaochen Xie
  • Cong Fu 0003
  • Stephan Wojtowytsch
  • Shuiwang Ji

We consider using deep neural networks to solve time-dependent partial differential equations (PDEs), where multi-scale processing is crucial for modeling complex, time-evolving dynamics. While the U-Net architecture with skip connections is commonly used by prior studies to enable multi-scale processing, our analysis shows that the need for features to evolve across layers results in temporally misaligned features in skip connections, which limits the model’s performance. To address this limitation, we propose SineNet, consisting of multiple sequentially connected U-shaped network blocks, referred to as waves. In SineNet, high-resolution features are evolved progressively through multiple stages, thereby reducing the amount of misalignment within each stage. We furthermore analyze the role of skip connections in enabling both parallel and sequential processing of multi-scale information. Our method is rigorously tested on multiple PDE datasets, including the Navier-Stokes equations and shallow water equations, showcasing the advantages of our proposed approach over conventional U-Nets with a comparable parameter budget. We further demonstrate that increasing the number of waves in SineNet while maintaining the same number of parameters leads to a monotonically improved performance. The results highlight the effectiveness of SineNet and the potential of our approach in advancing the state-of-the-art in neural PDE solver design. Our code is available as part of AIRS (https://github.com/divelab/AIRS).

NeurIPS Conference 2023 Conference Paper

A new perspective on building efficient and expressive 3D equivariant graph neural networks

  • Weitao Du
  • Yuanqi Du
  • Limei Wang
  • Dieqiao Feng
  • Guifeng Wang
  • Shuiwang Ji
  • Carla P. Gomes
  • Zhi-Ming Ma

Geometric deep learning enables the encoding of physical symmetries in modeling 3D objects. Despite rapid progress in encoding 3D symmetries into Graph Neural Networks (GNNs), a comprehensive evaluation of the expressiveness of these network architectures through a local-to-global analysis lacks today. In this paper, we propose a local hierarchy of 3D isomorphism to evaluate the expressive power of equivariant GNNs and investigate the process of representing global geometric information from local patches. Our work leads to two crucial modules for designing expressive and efficient geometric GNNs; namely local substructure encoding (\textbf{LSE}) and frame transition encoding (\textbf{FTE}). To demonstrate the applicability of our theory, we propose LEFTNet which effectively implements these modules and achieves state-of-the-art performance on both scalar-valued and vector-valued molecular property prediction tasks. We further point out future design space for 3D equivariant graph neural networks. Our codes are available at \url{https: //github. com/yuanqidu/LeftNet}.

ICLR Conference 2023 Conference Paper

Automated Data Augmentations for Graph Classification

  • Youzhi Luo
  • Michael McThrow
  • Wing Yee Au 0002
  • Tao Komikado
  • Kanji Uchino
  • Koji Maruhashi
  • Shuiwang Ji

Data augmentations are effective in improving the invariance of learning machines. We argue that the core challenge of data augmentations lies in designing data transformations that preserve labels. This is relatively straightforward for images, but much more challenging for graphs. In this work, we propose GraphAug, a novel automated data augmentation method aiming at computing label-invariant augmentations for graph classification. Instead of using uniform transformations as in existing studies, GraphAug uses an automated augmentation model to avoid compromising critical label-related information of the graph, thereby producing label-invariant augmentations at most times. To ensure label-invariance, we develop a training method based on reinforcement learning to maximize an estimated label-invariance probability. Experiments show that GraphAug outperforms previous graph augmentation methods on various graph classification tasks.

ICML Conference 2023 Conference Paper

Efficient and Equivariant Graph Networks for Predicting Quantum Hamiltonian

  • Haiyang Yu 0005
  • Zhao Xu 0005
  • Xiaofeng Qian
  • Xiaoning Qian
  • Shuiwang Ji

We consider the prediction of the Hamiltonian matrix, which finds use in quantum chemistry and condensed matter physics. Efficiency and equivariance are two important, but conflicting factors. In this work, we propose a SE(3)-equivariant network, named QHNet, that achieves efficiency and equivariance. Our key advance lies at the innovative design of QHNet architecture, which not only obeys the underlying symmetries, but also enables the reduction of number of tensor products by 92%. In addition, QHNet prevents the exponential growth of channel dimension when more atom types are involved. We perform experiments on MD17 datasets, including four molecular systems. Experimental results show that our QHNet can achieve comparable performance to the state of the art methods at a significantly faster speed. Besides, our QHNet consumes 50% less memory due to its streamlined architecture. Our code is publicly available as part of the AIRS library (https: //github. com/divelab/AIRS).

ICML Conference 2023 Conference Paper

Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction

  • Yuchao Lin
  • Keqiang Yan
  • Youzhi Luo
  • Yi Liu 0059
  • Xiaoning Qian
  • Shuiwang Ji

We study property prediction for crystal materials. A crystal structure consists of a minimal unit cell that is repeated infinitely in 3D space. How to accurately represent such repetitive structures in machine learning models remains unresolved. Current methods construct graphs by establishing edges only between nearby nodes, thereby failing to faithfully capture infinite repeating patterns and distant interatomic interactions. In this work, we propose several innovations to overcome these limitations. First, we propose to model physics-principled interatomic potentials directly instead of only using distances as in many existing methods. These potentials include the Coulomb potential, London dispersion potential, and Pauli repulsion potential. Second, we model the complete set of potentials among all atoms, instead of only between nearby atoms as in existing methods. This is enabled by our approximations of infinite potential summations with provable error bounds. We further develop efficient algorithms to compute the approximations. Finally, we propose to incorporate our computations of complete interatomic potentials into message passing neural networks for representation learning. We perform experiments on the JARVIS and Materials Project benchmarks for evaluation. Results show that the use of interatomic potentials and complete interatomic potentials leads to consistent performance improvements with reasonable computational costs. Our code is publicly available as part of the AIRS library (https: //github. com/divelab/AIRS).

ICLR Conference 2023 Conference Paper

Gradient-Guided Importance Sampling for Learning Binary Energy-Based Models

  • Meng Liu 0015
  • Haoran Liu
  • Shuiwang Ji

Learning energy-based models (EBMs) is known to be difficult especially on discrete data where gradient-based learning strategies cannot be applied directly. Although ratio matching is a sound method to learn discrete EBMs, it suffers from expensive computation and excessive memory requirements, thereby resulting in difficulties in learning EBMs on high-dimensional data. Motivated by these limitations, in this study, we propose ratio matching with gradient-guided importance sampling (RMwGGIS). Particularly, we use the gradient of the energy function w.r.t. the discrete data space to approximately construct the provably optimal proposal distribution, which is subsequently used by importance sampling to efficiently estimate the original ratio matching objective. We perform experiments on density modeling over synthetic discrete data, graph generation, and training Ising models to evaluate our proposed method. The experimental results demonstrate that our method can significantly alleviate the limitations of ratio matching, perform more effectively in practice, and scale to high-dimensional problems. Our implementation is available at https://github.com/divelab/RMwGGIS.

ICML Conference 2023 Conference Paper

Graph Mixup with Soft Alignments

  • Hongyi Ling
  • Zhimeng Jiang
  • Meng Liu 0015
  • Shuiwang Ji
  • Na Zou 0001

We study graph data augmentation by mixup, which has been used successfully on images. A key operation of mixup is to compute a convex combination of a pair of inputs. This operation is straightforward for grid-like data, such as images, but challenging for graph data. The key difficulty lies in the fact that different graphs typically have different numbers of nodes, and thus there lacks a node-level correspondence between graphs. In this work, we propose S-Mixup, a simple yet effective mixup method for graph classification by soft alignments. Specifically, given a pair of graphs, we explicitly obtain node-level correspondence via computing a soft assignment matrix to match the nodes between two graphs. Based on the soft assignments, we transform the adjacency and node feature matrices of one graph, so that the transformed graph is aligned with the other graph. In this way, any pair of graphs can be mixed directly to generate an augmented graph. We conduct systematic experiments to show that S-Mixup can improve the performance and generalization of graph neural networks (GNNs) on various graph classification tasks. In addition, we show that S-Mixup can increase the robustness of GNNs against noisy labels. Our code is publicly available as part of the DIG package (https: //github. com/divelab/DIG).

ICML Conference 2023 Conference Paper

Group Equivariant Fourier Neural Operators for Partial Differential Equations

  • Jacob Helwig
  • Xuan Zhang
  • Cong Fu 0003
  • Jerry Kurtin
  • Stephan Wojtowytsch
  • Shuiwang Ji

We consider solving partial differential equations (PDEs) with Fourier neural operators (FNOs), which operate in the frequency domain. Since the laws of physics do not depend on the coordinate system used to describe them, it is desirable to encode such symmetries in the neural operator architecture for better performance and easier learning. While encoding symmetries in the physical domain using group theory has been studied extensively, how to capture symmetries in the frequency domain is under-explored. In this work, we extend group convolutions to the frequency domain and design Fourier layers that are equivariant to rotations, translations, and reflections by leveraging the equivariance property of the Fourier transform. The resulting $G$-FNO architecture generalizes well across input resolutions and performs well in settings with varying levels of symmetry. Our code is publicly available as part of the AIRS library (https: //github. com/divelab/AIRS).

NeurIPS Conference 2023 Conference Paper

Joint Learning of Label and Environment Causal Independence for Graph Out-of-Distribution Generalization

  • Shurui Gui
  • Meng Liu
  • Xiner Li
  • Youzhi Luo
  • Shuiwang Ji

We tackle the problem of graph out-of-distribution (OOD) generalization. Existing graph OOD algorithms either rely on restricted assumptions or fail to exploit environment information in training data. In this work, we propose to simultaneously incorporate label and environment causal independence (LECI) to fully make use of label and environment information, thereby addressing the challenges faced by prior methods on identifying causal and invariant subgraphs. We further develop an adversarial training strategy to jointly optimize these two properties for casual subgraph discovery with theoretical guarantees. Extensive experiments and analysis show that LECI significantly outperforms prior methods on both synthetic and real-world datasets, establishing LECI as a practical and effective solution for graph OOD generalization.

ICLR Conference 2023 Conference Paper

Learning Fair Graph Representations via Automated Data Augmentations

  • Hongyi Ling
  • Zhimeng Jiang
  • Youzhi Luo
  • Shuiwang Ji
  • Na Zou 0001

We consider fair graph representation learning via data augmentations. While this direction has been explored previously, existing methods invariably rely on certain assumptions on the properties of fair graph data in order to design fixed strategies on data augmentations. Nevertheless, the exact properties of fair graph data may vary significantly in different scenarios. Hence, heuristically designed augmentations may not always generate fair graph data in different application scenarios. In this work, we propose a method, known as Graphair, to learn fair representations based on automated graph data augmentations. Such fairness-aware augmentations are themselves learned from data. Our Graphair is designed to automatically discover fairness-aware augmentations from input graphs in order to circumvent sensitive information while preserving other useful information. Experimental results demonstrate that our Graphair consistently outperforms many baselines on multiple node classification datasets in terms of fairness-accuracy trade-off performance. In addition, results indicate that Graphair can automatically learn to generate fair graph data without prior knowledge on fairness-relevant graph properties.

ICLR Conference 2023 Conference Paper

Learning Hierarchical Protein Representations via Complete 3D Graph Networks

  • Limei Wang
  • Haoran Liu
  • Yi Liu 0059
  • Jerry Kurtin
  • Shuiwang Ji

We consider representation learning for proteins with 3D structures. We build 3D graphs based on protein structures and develop graph networks to learn their representations. Depending on the levels of details that we wish to capture, protein representations can be computed at different levels, \emph{e.g.}, the amino acid, backbone, or all-atom levels. Importantly, there exist hierarchical relations among different levels. In this work, we propose to develop a novel hierarchical graph network, known as ProNet, to capture the relations. Our ProNet is very flexible and can be used to compute protein representations at different levels of granularity. By treating each amino acid as a node in graph modeling as well as harnessing the inherent hierarchies, our ProNet is more effective and efficient than existing methods. We also show that, given a base 3D graph network that is complete, our ProNet representations are also complete at all levels. Experimental results show that ProNet outperforms recent methods on most datasets. In addition, results indicate that different downstream tasks may require representations at different levels. Our code is publicly available as part of the DIG library (\url{https://github.com/divelab/DIG}).

NeurIPS Conference 2023 Conference Paper

QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules

  • Haiyang Yu
  • Meng Liu
  • Youzhi Luo
  • Alex Strasser
  • Xiaofeng Qian
  • Xiaoning Qian
  • Shuiwang Ji

Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2, 399 molecular dynamics trajectories and 130, 831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at \url{https: //github. com/divelab/AIRS/tree/main/OpenDFT/QHBench}.

NeurIPS Conference 2023 Conference Paper

Towards Symmetry-Aware Generation of Periodic Materials

  • Youzhi Luo
  • Chengkai Liu
  • Shuiwang Ji

We consider the problem of generating periodic materials with deep models. While symmetry-aware molecule generation has been studied extensively, periodic materials possess different symmetries, which have not been completely captured by existing methods. In this work, we propose SyMat, a novel material generation approach that can capture physical symmetries of periodic material structures. SyMat generates atom types and lattices of materials through generating atom type sets, lattice lengths and lattice angles with a variational auto-encoder model. In addition, SyMat employs a score-based diffusion model to generate atom coordinates of materials, in which a novel symmetry-aware probabilistic model is used in the coordinate diffusion process. We show that SyMat is theoretically invariant to all symmetry transformations on materials and demonstrate that SyMat achieves promising performance on random generation and property optimization tasks. Our code is publicly available as part of the AIRS library (https: //github. com/divelab/AIRS).

NeurIPS Conference 2023 Conference Paper

Video Timeline Modeling For News Story Understanding

  • Meng Liu
  • Mingda Zhang
  • Jialu Liu
  • Hanjun Dai
  • Ming-Hsuan Yang
  • Shuiwang Ji
  • Zheyun Feng
  • Boqing Gong

In this paper, we present a novel problem, namely video timeline modeling. Our objective is to create a video-associated timeline from a set of videos related to a specific topic, thereby facilitating the content and structure understanding of the story being told. This problem has significant potential in various real-world applications, for instance, news story summarization. To bootstrap research in this area, we curate a realistic benchmark dataset, YouTube-News-Timeline, consisting of over $12$k timelines and $300$k YouTube news videos. Additionally, we propose a set of quantitative metrics to comprehensively evaluate and compare methodologies. With such a testbed, we further develop and benchmark several deep learning approaches to tackling this problem. We anticipate that this exploratory work will pave the way for further research in video timeline modeling. The assets are available via https: //github. com/google-research/google-research/tree/master/video_timeline_modeling.

ICLR Conference 2022 Conference Paper

An Autoregressive Flow Model for 3D Molecular Geometry Generation from Scratch

  • Youzhi Luo
  • Shuiwang Ji

We consider the problem of generating 3D molecular geometries from scratch. While multiple methods have been developed for generating molecular graphs, generating 3D molecular geometries from scratch is largely under-explored. In this work, we propose G-SphereNet, a novel autoregressive flow model for generating 3D molecular geometries. G-SphereNet employs a flexible sequential generation scheme by placing atoms in 3D space step-by-step. Instead of generating 3D coordinates directly, we propose to determine 3D positions of atoms by generating distances, angles and torsion angles, thereby ensuring both invariance and equivariance properties. In addition, we propose to use spherical message passing and attention mechanism for conditional information extraction. Experimental results show that G-SphereNet outperforms previous methods on random molecular geometry generation and targeted molecule discovery tasks. Our code is publicly available as part of the DIG package (https://github.com/divelab/DIG).

NeurIPS Conference 2022 Conference Paper

ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs

  • Limei Wang
  • Yi Liu
  • Yuchao Lin
  • Haoran Liu
  • Shuiwang Ji

Many real-world data can be modeled as 3D graphs, but learning representations that incorporates 3D information completely and efficiently is challenging. Existing methods either use partial 3D information, or suffer from excessive computational cost. To incorporate 3D information completely and efficiently, we propose a novel message passing scheme that operates within 1-hop neighborhood. Our method guarantees full completeness of 3D information on 3D graphs by achieving global and local completeness. Notably, we propose the important rotation angles to fulfill global completeness. Additionally, we show that our method is orders of magnitude faster than prior methods. We provide rigorous proof of completeness and analysis of time complexity for our methods. As molecules are in essence quantum systems, we build the \underline{com}plete and \underline{e}fficient graph neural network (ComENet) by combing quantum inspired basis functions and the proposed message passing scheme. Experimental results demonstrate the capability and efficiency of ComENet, especially on real-world datasets that are large in both numbers and sizes of graphs. Our code is publicly available as part of the DIG library (\url{https: //github. com/divelab/DIG}).

ICML Conference 2022 Conference Paper

Generating 3D Molecules for Target Protein Binding

  • Meng Liu 0015
  • Youzhi Luo
  • Kanji Uchino
  • Koji Maruhashi
  • Shuiwang Ji

A fundamental problem in drug discovery is to design molecules that bind to specific proteins. To tackle this problem using machine learning methods, here we propose a novel and effective framework, known as GraphBP, to generate 3D molecules that bind to given proteins by placing atoms of specific types and locations to the given binding site one by one. In particular, at each step, we first employ a 3D graph neural network to obtain geometry-aware and chemically informative representations from the intermediate contextual information. Such context includes the given binding site and atoms placed in the previous steps. Second, to preserve the desirable equivariance property, we select a local reference atom according to the designed auxiliary classifiers and then construct a local spherical coordinate system. Finally, to place a new atom, we generate its atom type and relative location w. r. t. the constructed local coordinate system via a flow model. We also consider generating the variables of interest sequentially to capture the underlying dependencies among them. Experiments demonstrate that our GraphBP is effective to generate 3D molecules with binding ability to target protein binding sites. Our implementation is available at https: //github. com/divelab/GraphBP.

NeurIPS Conference 2022 Conference Paper

GOOD: A Graph Out-of-Distribution Benchmark

  • Shurui Gui
  • Xiner Li
  • Limei Wang
  • Shuiwang Ji

Out-of-distribution (OOD) learning deals with scenarios in which training and test data follow different distributions. Although general OOD problems have been intensively studied in machine learning, graph OOD is only an emerging area of research. Currently, there lacks a systematic benchmark tailored to graph OOD method evaluation. In this work, we aim at developing an OOD benchmark, known as GOOD, for graphs specifically. We explicitly make distinctions between covariate and concept shifts and design data splits that accurately reflect different shifts. We consider both graph and node prediction tasks as there are key differences in designing shifts. Overall, GOOD contains 11 datasets with 17 domain selections. When combined with covariate, concept, and no shifts, we obtain 51 different splits. We provide performance results on 10 commonly used baseline methods with 10 random runs. This results in 510 dataset-model combinations in total. Our results show significant performance gaps between in-distribution and OOD settings. Our results also shed light on different performance trends between covariate and concept shifts by different methods. Our GOOD benchmark is a growing project and expects to expand in both quantity and variety of resources as the area develops. The GOOD benchmark can be accessed via https: //github. com/divelab/GOOD/.

ICML Conference 2022 Conference Paper

GraphFM: Improving Large-Scale GNN Training via Feature Momentum

  • Haiyang Yu 0005
  • Limei Wang
  • Bokun Wang
  • Meng Liu 0015
  • Tianbao Yang
  • Shuiwang Ji

Training of graph neural networks (GNNs) for large-scale node classification is challenging. A key difficulty lies in obtaining accurate hidden node representations while avoiding the neighborhood explosion problem. Here, we propose a new technique, named feature momentum (FM), that uses a momentum step to incorporate historical embeddings when updating feature representations. We develop two specific algorithms, known as GraphFM-IB and GraphFM-OB, that consider in-batch and out-of-batch data, respectively. GraphFM-IB applies FM to in-batch sampled data, while GraphFM-OB applies FM to out-of-batch data that are 1-hop neighborhood of in-batch data. We provide a convergence analysis for GraphFM-IB and some theoretical insight for GraphFM-OB. Empirically, we observe that GraphFM-IB can effectively alleviate the neighborhood explosion problem of existing methods. In addition, GraphFM-OB achieves promising performance on multiple large-scale graph datasets.

NeurIPS Conference 2022 Conference Paper

Periodic Graph Transformers for Crystal Material Property Prediction

  • Keqiang Yan
  • Yi Liu
  • Yuchao Lin
  • Shuiwang Ji

We consider representation learning on periodic graphs encoding crystal materials. Different from regular graphs, periodic graphs consist of a minimum unit cell repeating itself on a regular lattice in 3D space. How to effectively encode these periodic structures poses unique challenges not present in regular graph representation learning. In addition to being E(3) invariant, periodic graph representations need to be periodic invariant. That is, the learned representations should be invariant to shifts of cell boundaries as they are artificially imposed. Furthermore, the periodic repeating patterns need to be captured explicitly as lattices of different sizes and orientations may correspond to different materials. In this work, we propose a transformer architecture, known as Matformer, for periodic graph representation learning. Our Matformer is designed to be invariant to periodicity and can capture repeating patterns explicitly. In particular, Matformer encodes periodic patterns by efficient use of geometric distances between the same atoms in neighboring cells. Experimental results on multiple common benchmark datasets show that our Matformer outperforms baseline methods consistently. In addition, our results demonstrate the importance of periodic invariance and explicit repeating pattern encoding for crystal representation learning. Our code is publicly available at https: //github. com/YKQ98/Matformer.

ICML Conference 2022 Conference Paper

Self-Supervised Representation Learning via Latent Graph Prediction

  • Yaochen Xie
  • Zhao Xu 0005
  • Shuiwang Ji

Self-supervised learning (SSL) of graph neural networks is emerging as a promising way of leveraging unlabeled data. Currently, most methods are based on contrastive learning adapted from the image domain, which requires view generation and a sufficient number of negative samples. In contrast, existing predictive models do not require negative sampling, but lack theoretical guidance on the design of pretext training tasks. In this work, we propose the LaGraph, a theoretically grounded predictive SSL framework based on latent graph prediction. Learning objectives of LaGraph are derived as self-supervised upper bounds to objectives for predicting unobserved latent graphs. In addition to its improved performance, LaGraph provides explanations for recent successes of predictive models that include invariance-based objectives. We provide theoretical analysis comparing LaGraph to related methods in different domains. Our experimental results demonstrate the superiority of LaGraph in performance and the robustness to decreasing of training sample size on both graph-level and node-level tasks.

ICLR Conference 2022 Conference Paper

Spherical Message Passing for 3D Molecular Graphs

  • Yi Liu 0059
  • Limei Wang
  • Meng Liu 0015
  • Yuchao Lin
  • Xuan Zhang
  • Bora Oztekin
  • Shuiwang Ji

We consider representation learning of 3D molecular graphs in which each atom is associated with a spatial position in 3D. This is an under-explored area of research, and a principled message passing framework is currently lacking. In this work, we conduct analyses in the spherical coordinate system (SCS) for the complete identification of 3D graph structures. Based on such observations, we propose the spherical message passing (SMP) as a novel and powerful scheme for 3D molecular learning. SMP dramatically reduces training complexity, enabling it to perform efficiently on large-scale molecules. In addition, SMP is capable of distinguishing almost all molecular structures, and the uncovered cases may not exist in practice. Based on meaningful physically-based representations of 3D information, we further propose the SphereNet for 3D molecular learning. Experimental results demonstrate that the use of meaningful 3D information in SphereNet leads to significant performance improvements in prediction tasks. Our results also demonstrate the advantages of SphereNet in terms of capability, efficiency, and scalability.

NeurIPS Conference 2022 Conference Paper

Task-Agnostic Graph Explanations

  • Yaochen Xie
  • Sumeet Katariya
  • Xianfeng Tang
  • Edward Huang
  • Nikhil Rao
  • Karthik Subbian
  • Shuiwang Ji

Graph Neural Networks (GNNs) have emerged as powerful tools to encode graph-structured data. Due to their broad applications, there is an increasing need to develop tools to explain how GNNs make decisions given graph-structured data. Existing learning-based GNN explanation approaches are task-specific in training and hence suffer from crucial drawbacks. Specifically, they are incapable of producing explanations for a multitask prediction model with a single explainer. They are also unable to provide explanations in cases where the GNN is trained in a self-supervised manner, and the resulting representations are used in future downstream tasks. To address these limitations, we propose a Task-Agnostic GNN Explainer (TAGE) that is independent of downstream models and trained under self-supervision with no knowledge of downstream tasks. TAGE enables the explanation of GNN embedding models with unseen downstream tasks and allows efficient explanation of multitask models. Our extensive experiments show that TAGE can significantly speed up the explanation efficiency by using the same model to explain predictions for multiple downstream tasks while achieving explanation quality as good as or even better than current state-of-the-art GNN explanation approaches.

NeurIPS Conference 2021 Conference Paper

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

  • Zhanqiu Zhang
  • Jie Wang
  • Jiajun Chen
  • Shuiwang Ji
  • Feng Wu

Query embedding (QE)---which aims to embed entities and first-order logical (FOL) queries in low-dimensional spaces---has shown great power in multi-hop reasoning over knowledge graphs. Recently, embedding entities and queries with geometric shapes becomes a promising direction, as geometric shapes can naturally represent answer sets of queries and logical relationships among them. However, existing geometry-based models have difficulty in modeling queries with negation, which significantly limits their applicability. To address this challenge, we propose a novel query embedding model, namely \textbf{Con}e \textbf{E}mbeddings (ConE), which is the first geometry-based QE model that can handle all the FOL operations, including conjunction, disjunction, and negation. Specifically, ConE represents entities and queries as Cartesian products of two-dimensional cones, where the intersection and union of cones naturally model the conjunction and disjunction operations. By further noticing that the closure of complement of cones remains cones, we design geometric complement operators in the embedding space for the negation operations. Experiments demonstrate that ConE significantly outperforms existing state-of-the-art methods on benchmark datasets.

JMLR Journal 2021 Journal Article

DIG: A Turnkey Library for Diving into Graph Deep Learning Research

  • Meng Liu
  • Youzhi Luo
  • Limei Wang
  • Yaochen Xie
  • Hao Yuan
  • Shurui Gui
  • Haiyang Yu
  • Zhao Xu

Although there exist several libraries for deep learning on graphs, they are aiming at implementing basic operations for graph deep learning. In the research community, implementing and benchmarking various advanced tasks are still painful and time-consuming with existing libraries. To facilitate graph deep learning research, we introduce DIG: Dive into Graphs, a turnkey library that provides a unified testbed for higher level, research-oriented graph deep learning tasks. Currently, we consider graph generation, self-supervised learning on graphs, explainability of graph neural networks, and deep learning on 3D graphs. For each direction, we provide unified implementations of data interfaces, common algorithms, and evaluation metrics. Altogether, DIG is an extensible, open-source, and turnkey library for researchers to develop new methods and effortlessly compare with common baselines using widely used datasets and evaluation metrics. Source code is available at https://github.com/divelab/DIG. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

ICML Conference 2021 Conference Paper

GraphDF: A Discrete Flow Model for Molecular Graph Generation

  • Youzhi Luo
  • Keqiang Yan
  • Shuiwang Ji

We consider the problem of molecular graph generation using deep models. While graphs are discrete, most existing methods use continuous latent variables, resulting in inaccurate modeling of discrete graph structures. In this work, we propose GraphDF, a novel discrete latent variable model for molecular graph generation based on normalizing flow methods. GraphDF uses invertible modulo shift transforms to map discrete latent variables to graph nodes and edges. We show that the use of discrete latent variables reduces computational costs and eliminates the negative effect of dequantization. Comprehensive experimental results show that GraphDF outperforms prior methods on random generation, property optimization, and constrained optimization tasks.

ICML Conference 2021 Conference Paper

On Explainability of Graph Neural Networks via Subgraph Explorations

  • Hao Yuan 0001
  • Haiyang Yu 0005
  • Jie Wang 0005
  • Kang Li 0004
  • Shuiwang Ji

We consider the problem of explaining the predictions of graph neural networks (GNNs), which otherwise are considered as black boxes. Existing methods invariably focus on explaining the importance of graph nodes or edges but ignore the substructures of graphs, which are more intuitive and human-intelligible. In this work, we propose a novel method, known as SubgraphX, to explain GNNs by identifying important subgraphs. Given a trained GNN model and an input graph, our SubgraphX explains its predictions by efficiently exploring different subgraphs with Monte Carlo tree search. To make the tree search more effective, we propose to use Shapley values as a measure of subgraph importance, which can also capture the interactions among different subgraphs. To expedite computations, we propose efficient approximation schemes to compute Shapley values for graph data. Our work represents the first attempt to explain GNNs via identifying subgraphs explicitly and directly. Experimental results show that our SubgraphX achieves significantly improved explanations, while keeping computations at a reasonable level.

NeurIPS Conference 2021 Conference Paper

Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence

  • Qi Qi
  • Youzhi Luo
  • Zhao Xu
  • Shuiwang Ji
  • Tianbao Yang

Areas under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems. Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanced datasets. While stochastic optimization of AUROC has been studied extensively, principled stochastic optimization of AUPRC has been rarely explored. In this work, we propose a principled technical method to optimize AUPRC for deep learning. Our approach is based on maximizing the averaged precision (AP), which is an unbiased point estimator of AUPRC. We cast the objective into a sum of dependent compositional functions with inner functions dependent on random variables of the outer level. We propose efficient adaptive and non-adaptive stochastic algorithms named SOAP with provable convergence guarantee under mild conditions by leveraging recent advances in stochastic compositional optimization. Extensive experimental results on image and graph datasets demonstrate that our proposed method outperforms prior methods on imbalanced problems in terms of AUPRC. To the best of our knowledge, our work represents the first attempt to optimize AUPRC with provable convergence. The SOAP has been implemented in the libAUC library at https: //libauc. org/.

AAAI Conference 2020 Conference Paper

A Multi-Scale Approach for Graph Link Prediction

  • Lei Cai
  • Shuiwang Ji

Deep models can be made scale-invariant when trained with multi-scale information. Images can be easily made multi-scale, given their grid-like structures. Extending this to generic graphs poses major challenges. For example, in link prediction tasks, inputs are represented as graphs consisting of nodes and edges. Currently, the state-of-the-art model for link prediction uses supervised heuristic learning, which learns graph structure features centered on two target nodes. It then learns graph neural networks to predict the existence of links based on graph structure features. Thus, the performance of link prediction models highly depends on graph structure features. In this work, we propose a novel node aggregation method that can transform the enclosing subgraph into different scales and preserve the relationship between two target nodes for link prediction. A theory for analyzing the information loss during the re-scaling procedure is also provided. Graphs in different scales can provide scaleinvariant information, which enables graph neural networks to learn invariant features and improve link prediction performance. Our experimental results on 14 datasets from different areas demonstrate that our proposed method outperforms the state-of-the-art methods by employing multi-scale graphs without additional parameters.

AAAI Conference 2020 Conference Paper

Adaptive Convolutional ReLUs

  • Hongyang Gao
  • Lei Cai
  • Shuiwang Ji

Rectified linear units (ReLUs) are currently the most popular activation function used in neural networks. Although ReLUs can solve the gradient vanishing problem and accelerate training convergence, it suffers from the dying ReLU problem in which some neurons are never activated if the weights are not updated properly. In this work, we propose a novel activation function, known as the adaptive convolutional ReLU (ConvReLU), that can better mimic brain neuron activation behaviors and overcome the dying ReLU problem. With our novel parameter sharing scheme, ConvReLUs can be applied to convolution layers that allow each input neuron to be activated by different trainable thresholds without involving a large number of extra parameters. We employ the zero initialization scheme in ConvReLU to encourage trainable thresholds to be close to zero. Finally, we develop a partial replacement strategy that only replaces the ReLUs in the early layers of the network. This resolves the dying ReLU problem and retains sparse representations for linear classifiers. Experimental results demonstrate that our proposed ConvReLU has consistently better performance compared to ReLU, LeakyReLU, and PReLU. In addition, the partial replacement strategy is shown to be effective not only for our ConvReLU but also for LeakyReLU and PReLU.

NeurIPS Conference 2020 Conference Paper

Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising

  • Yaochen Xie
  • Zhengyang Wang
  • Shuiwang Ji

Self-supervised frameworks that learn denoising models with merely individual noisy images have shown strong capability and promising performance in various image denoising tasks. Existing self-supervised denoising frameworks are mostly built upon the same theoretical foundation, where the denoising models are required to be J-invariant. However, our analyses indicate that the current theory and the J-invariance may lead to denoising models with reduced performance. In this work, we introduce Noise2Same, a novel self-supervised denoising framework. In Noise2Same, a new self-supervised loss is proposed by deriving a self-supervised upper bound of the typical supervised loss. In particular, Noise2Same requires neither J-invariance nor extra information about the noise model and can be used in a wider range of denoising applications. We analyze our proposed Noise2Same both theoretically and experimentally. The experimental results show that our Noise2Same remarkably outperforms previous self-supervised denoising methods in terms of denoising performance and training efficiency.

AAAI Conference 2020 Conference Paper

Non-Local U-Nets for Biomedical Image Segmentation

  • Zhengyang Wang
  • Na Zou
  • Dinggang Shen
  • Shuiwang Ji

Deep learning has shown its great promise in various biomedical image segmentation tasks. Existing models are typically based on U-Net and rely on an encoder-decoder architecture with stacked local operators to aggregate long-range information gradually. However, only using the local operators limits the efficiency and effectiveness. In this work, we propose the non-local U-Nets, which are equipped with flexible global aggregation blocks, for biomedical image segmentation. These blocks can be inserted into U-Net as size-preserving processes, as well as down-sampling and up-sampling layers. We perform thorough experiments on the 3D multimodality isointense infant brain MR image segmentation task to evaluate the non-local U-Nets. Results show that our proposed models achieve top performances with fewer parameters and faster computation.

ICLR Conference 2020 Conference Paper

StructPool: Structured Graph Pooling via Conditional Random Fields

  • Hao Yuan 0001
  • Shuiwang Ji

Learning high-level representations for graphs is of great importance for graph analysis tasks. In addition to graph convolution, graph pooling is an important but less explored research area. In particular, most of existing graph pooling techniques do not consider the graph structural information explicitly. We argue that such information is important and develop a novel graph pooling technique, know as the StructPool, in this work. We consider the graph pooling as a node clustering problem, which requires the learning of a cluster assignment matrix. We propose to formulate it as a structured prediction problem and employ conditional random fields to capture the relationships among assignments of different nodes. We also generalize our method to incorporate graph topological information in designing the Gibbs energy function. Experimental results on multiple datasets demonstrate the effectiveness of our proposed StructPool.

IJCAI Conference 2019 Conference Paper

Dense Transformer Networks for Brain Electron Microscopy Image Segmentation

  • Jun Li
  • Yongjun Chen
  • Lei Cai
  • Ian Davidson
  • Shuiwang Ji

The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods.

ICML Conference 2019 Conference Paper

Graph U-Nets

  • Hongyang Gao
  • Shuiwang Ji

We consider the problem of representation learning for graph data. Convolutional neural networks can naturally operate on images, but have significant challenges in dealing with graph data. Given images are special cases of graphs with nodes lie on 2D lattices, graph embedding tasks have a natural correspondence with image pixel-wise prediction tasks such as segmentation. While encoder-decoder architectures like U-Nets have been successfully applied on many image pixel-wise prediction tasks, similar methods are lacking for graph data. This is due to the fact that pooling and up-sampling operations are not natural on graph data. To address these challenges, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work. The gPool layer adaptively selects some nodes to form a smaller graph based on their scalar projection values on a trainable projection vector. We further propose the gUnpool layer as the inverse operation of the gPool layer. The gUnpool layer restores the graph into its original structure using the position information of nodes selected in the corresponding gPool layer. Based on our proposed gPool and gUnpool layers, we develop an encoder-decoder model on graph, known as the graph U-Nets. Our experimental results on node classification and graph classification tasks demonstrate that our methods achieve consistently better performance than previous models.

AAAI Conference 2019 Conference Paper

Interpreting Deep Models for Text Analysis via Optimization and Regularization Methods

  • Hao Yuan
  • Yongjun Chen
  • Xia Hu
  • Shuiwang Ji

Interpreting deep neural networks is of great importance to understand and verify deep models for natural language processing (NLP) tasks. However, most existing approaches only focus on improving the performance of models but ignore their interpretability. In this work, we propose an approach to investigate the meaning of hidden neurons of the convolutional neural network (CNN) models. We first employ saliency map and optimization techniques to approximate the detected information of hidden neurons from input sentences. Then we develop regularization terms and explore words in vocabulary to interpret such detected information. Experimental results demonstrate that our approach can identify meaningful and reasonable interpretations for hidden spatial locations. Additionally, we show that our approach can describe the decision procedure of deep NLP models.

NeurIPS Conference 2018 Conference Paper

ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions

  • Hongyang Gao
  • Zhengyang Wang
  • Shuiwang Ji

Convolutional neural networks (CNNs) have shown great capability of solving various artificial intelligence tasks. However, the increasing model size has raised challenges in employing them in resource-limited applications. In this work, we propose to compress deep models by using channel-wise convolutions, which replace dense connections among feature maps with sparse ones in CNNs. Based on this novel operation, we build light-weight CNNs known as ChannelNets. ChannelNets use three instances of channel-wise convolutions; namely group channel-wise convolutions, depth-wise separable channel-wise convolutions, and the convolutional classification layer. Compared to prior CNNs designed for mobile devices, ChannelNets achieve a significant reduction in terms of the number of parameters and computational cost without loss in accuracy. Notably, our work represents the first attempt to compress the fully-connected classification layer, which usually accounts for about 25% of total parameters in compact CNNs. Experimental results on the ImageNet dataset demonstrate that ChannelNets achieve consistently better performance compared to prior methods.

JBHI Journal 2015 Journal Article

A Robust Deep Model for Improved Classification of AD/MCI Patients

  • Feng Li
  • Loc Tran
  • Kim-Han Thung
  • Shuiwang Ji
  • Dinggang Shen
  • Jiang Li

Accurate classification of Alzheimer's disease (AD) and its prodromal stage, mild cognitive impairment (MCI), plays a critical role in possibly preventing progression of memory impairment and improving quality of life for AD patients. Among many research tasks, it is of a particular interest to identify noninvasive imaging biomarkers for AD diagnosis. In this paper, we present a robust deep learning system to identify different progression stages of AD patients based on MRI and PET scans. We utilized the dropout technique to improve classical deep learning by preventing its weight coadaptation, which is a typical cause of overfitting in deep learning. In addition, we incorporated stability selection, an adaptive learning factor, and a multitask learning strategy into the deep learning framework. We applied the proposed method to the ADNI dataset, and conducted experiments for AD and MCI conversion diagnosis. Experimental results showed that the dropout technique is very effective in AD diagnosis, improving the classification accuracies by 5. 9% on average as compared to the classical deep learning methods.

IJCAI Conference 2009 Conference Paper

  • Ying-Xin Li
  • Shuiwang Ji
  • Sudhir Kumar
  • Jieping Ye
  • Zhi-Hua Zhou

The Berkeley Drosophila Genome Project (BDGP) has produced a large number of gene expression patterns, many of which have been annotated textually with anatomical and developmental terms. These terms spatially correspond to local regions of the images; however, they are attached collectively to groups of images, such that it is unknown which term is assigned to which region of which image in the group. This poses a challenge to the development of the computational method to automate the textual description of expression patterns contained in each image. In this paper, we show that the underlying nature of this task matches well with a new machine learning framework, Multi-Instance Multi-Label learning (MIML). We propose a new MIML support vector machine to solve the problems that beset the annotation task. Empirical study shows that the proposed method outperforms the state-of-the-art Drosophila gene expression pattern annotation methods.

IJCAI Conference 2009 Conference Paper

  • Shuiwang Ji
  • Jieping Ye

Dimensionality reduction is an essential step in high-dimensional data analysis. Many dimensionality reduction algorithms have been applied successfully to multi-class and multi-label problems. They are commonly applied as a separate data preprocessing step before classification algorithms. In this paper, we study a joint learning framework in which we perform dimensionality reduction and multi-label classification simultaneously. We show that when the least squares loss is used in classification, this joint learning decouples into two separate components, i. e. , dimensionality reduction followed by multi-label classification. This analysis partially justifies the current practice of a separate application of dimensionality reduction for classi- fication problems. We extend our analysis using other loss functions, including the hinge loss and the squared hinge loss. We further extend the formulation to the more general case where the input data for different class labels may differ, overcoming the limitation of traditional dimensionality reduction algorithms. Experiments on benchmark data sets have been conducted to evaluate the proposed joint formulations.

IJCAI Conference 2009 Conference Paper

  • Liang Sun
  • Shuiwang Ji
  • Shipeng Yu
  • Jieping Ye

Canonical correlation analysis (CCA) and partial least squares (PLS) are well-known techniques for feature extraction from two sets of multidimensional variables. The fundamental difference between CCA and PLS is that CCA maximizes the correlation while PLS maximizes the covariance. Although both CCA and PLS have been applied successfully in various applications, the intrinsic relationship between them remains unclear. In this paper, we attempt to address this issue by showing the equivalence relationship between CCA and orthonormalized partial least squares (OPLS), a variant of PLS. We further extend the equivalence relationship to the case when regularization is employed for both sets of variables. In addition, we show that the CCA projection for one set of variables is independent of the regularization on the other set of variables. We have performed experimental studies using both synthetic and real data sets and our results confirm the established equivalence relationship. The presented analysis provides novel insights into the connection between these two existing algorithms as well as the effect of the regularization.

ICML Conference 2009 Conference Paper

An accelerated gradient method for trace norm minimization

  • Shuiwang Ji
  • Jieping Ye

We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multi-task learning, matrix classification, and matrix completion. The standard semidefinite programming formulation for this problem is computationally expensive. In addition, due to the non-smooth nature of the trace norm, the optimal first-order black-box method for solving such class of problems converges as O (1/√ k ), where k is the iteration counter. In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O (1/ k ). We further propose an accelerated gradient algorithm, which achieves the optimal convergence rate of O (1/ k 2 ) for smooth problems. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms.

UAI Conference 2009 Conference Paper

Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization

  • Jun Liu 0003
  • Shuiwang Ji
  • Jieping Ye

The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the 2, 1 -norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the 2, 1 -norm regularization is that it encourages multiple predictors to share similar sparsity patterns. However, the resulting optimization problem is challenging to solve due to the non-smoothness of the 2, 1 -norm regularization. In this paper, we propose to accelerate the computation by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov’s method—an optimal first-order black-box method for smooth convex optimization. A key building block in solving the reformulations is the Euclidean projection. We show that the Euclidean projection for the first reformulation can be analytically computed, while the Euclidean projection for the second one can be computed in linear time. Empirical evaluations on several data sets verify the efficiency of the proposed algorithms.

JMLR Journal 2008 Journal Article

Multi-class Discriminant Kernel Learning via Convex Programming

  • Jieping Ye
  • Shuiwang Ji
  • Jianhui Chen

Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed. [abs] [ pdf ][ bib ] &copy JMLR 2008. ( edit, beta )

NeurIPS Conference 2008 Conference Paper

Multi-label Multiple Kernel Learning

  • Shuiwang Ji
  • Liang Sun
  • Rong Jin
  • Jieping Ye

We present a multi-label multiple kernel learning (MKL) formulation, in which the data are embedded into a low-dimensional space directed by the instance-label correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, and it can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained and convex optimization problem. In addition, we show that the objective function of the approximate formulation is continuously differentiable with Lipschitz gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

ICML Conference 2007 Conference Paper

Discriminant kernel and regularization parameter learning via semidefinite programming

  • Jieping Ye
  • Jianhui Chen
  • Shuiwang Ji

Regularized Kernel Discriminant Analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. The performance of RKDA depends on the selection of kernels. In this paper, we consider the problem of learning an optimal kernel over a convex set of kernels. We show that the kernel learning problem can be formulated as a semidefinite program (SDP) in the binary-class case. We further extend the SDP formulation to the multi-class case. It is based on a key result established in this paper, that is, the multi-class kernel learning problem can be decomposed into a set of binary-class kernel learning problems. In addition, we propose an approximation scheme to reduce the computational complexity of the multi-class SDP formulation. The performance of RKDA also depends on the value of the regularization parameter. We show that this value can be learned automatically in the framework. Experimental results on benchmark data sets demonstrate the efficacy of the proposed SDP formulations.