Arrow Research search

Author name cluster

Han Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

AAAI Conference 2026 Conference Paper

Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs

  • Han Yang
  • Shaofeng Li
  • Tian Dong
  • Xiangyu Xu
  • Guangchi Liu
  • Zhen Ling

Deep Neural Networks (DNNs), as valuable intellectual property, face unauthorized use. Existing protections, such as digital watermarking, are largely passive; they provide only post-hoc ownership verification and cannot actively prevent the illicit use of a stolen model. This work proposes a proactive protection scheme, dubbed ``Authority Backdoor," which embeds access constraints directly into the model. In particular, the scheme utilizes a backdoor learning framework to intrinsically lock a model's utility, such that it performs normally only in the presence of a specific trigger (e.g., a hardware fingerprint). But in its absence, the DNN's performance degrades to be useless. To further enhance the security of the proposed authority scheme, the certifiable robustness is integrated to prevent an adaptive attacker from removing the implanted backdoor. The resulting framework establishes a secure authority mechanism for DNNs, combining access control with certifiable robustness against adversarial attacks. Extensive experiments on diverse architectures and datasets validate the effectiveness and certifiable robustness of the proposed framework.

JBHI Journal 2026 Journal Article

DRFNet: Enhancing Identity Discriminability and Feature Robustness for Cross-Session VEP-Based EEG Biometrics

  • Honggang Liu
  • Han Yang
  • Dongjun Liu
  • Xuanyu Jin
  • Yong Peng
  • Wanzeng Kong

Biometric recognition using visually evoked potentials (VEPs), a type of neural response to visual stimuli recorded via electroencephalography (EEG), has shown great promise. However, the non-stationary nature of EEG signals poses a major challenge in cross-session scenarios, where data collected on different days often leads to performance degradation. To address this, we propose the Discriminative Robust Feature Network (DRFNet) to enhance the robustness and inter-subject discriminability of identity representations across sessions. DRFNet incorporates two key components: (1) A log-power transformation that amplifies inter-individual differences by capturing non-linear energy patterns from VEP features via signal squaring and logarithmic scaling; and (2) A hierarchical normalization strategy with adaptive attention to balance discriminative identity cues with inter-session invariance by stabilizing feature distributions across multiple levels (feature map, batch, and sample). On two public multi-session SSVEP datasets (Dataset A: 30 subjects, 6 s trials; Dataset B: 54 subjects, 4 s trials), our model outperformed state-of-the-art methods, achieving identification accuracies of 92. 92% and 86. 30%, and equal error rates of 3. 92% and 4. 09%, respectively. Further analysis demonstrates that filter bank processing and a reduced set of parietal-occipital electrodes can provide more discriminative features while offering a practical path toward system lightweighting.

NeurIPS Conference 2025 Conference Paper

$\text{S}^2$Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation

  • Weilun Feng
  • Haotong Qin
  • Chuanguang Yang
  • Xiangqi Li
  • Han Yang
  • Yuqi Li
  • Zhulin An
  • Libo Huang

Diffusion transformers have emerged as the mainstream paradigm for video generation models. However, the use of up to billions of parameters incurs significant computational costs. Quantization offers a promising solution by reducing memory usage and accelerating inference. Nonetheless, we observe that the joint modeling of spatial and temporal information in video diffusion models (V-DMs) leads to extremely long token sequences, which introduces high calibration variance and learning challenges. To address these issues, we propose **$S^2$Q-VDiT**, a post-training quantization framework for V-DMs that leverages **S**alient data and **S**parse token distillation. During the calibration phase, we identify that quantization performance is highly sensitive to the choice of calibration data. To mitigate this, we introduce *Hessian-aware Salient Data Selection*, which constructs high-quality calibration datasets by considering both diffusion and quantization characteristics unique to V-DMs. To tackle the learning challenges, we further analyze the sparse attention patterns inherent in V-DMs. Based on this observation, we propose *Attention-guided Sparse Token Distillation*, which exploits token-wise attention distributions to emphasize tokens that are more influential to the model's output. Under W4A6 quantization, $S^2$Q-VDiT achieves lossless performance while delivering $3. 9\times$ model compression and $1. 3\times$ inference acceleration. Code will be available at https: //github. com/wlfeng0509/s2q-vdit.

JBHI Journal 2025 Journal Article

A Phase-Enhanced Neural Network With Dual-Path Transformer for Single-Channel Chest Sound Separation

  • Yuqi Wang
  • Han Yang
  • Zhixing Gao
  • Zhiwei Dai
  • Kang Yu
  • Tingting Song
  • Hao Yang
  • Yunfeng Wang

Auscultation of the chest is a fundamental diagnostic tool for cardiovascular and pulmonary diseases. However, the two main chest sound parts, heart sound (HS) and lung sound (LS), are often mixed, limiting diagnostic accuracy. This paper presents a novel Phase-Enhanced Neural Network (PENN) for HS and LS separation. To address the under-utilization of phase information, PENN integrates a feedforward connection that feeds the input spectrum into the Restorer, enabling phase recovery based on the local inference feature of phase. A time-frequency Dual-Path Transformer (DPT) is employed to expand the network's receptive field and enhance performance. To interpret the effectiveness of PENN, two new metrics, mSI-SDRi and pSI-SDRi, are proposed to separately evaluate the contributions of magnitude and phase. Experiments show that PENN achieves pSI-SDRi improvements of 1. 44 dB for HS and 2. 25 dB for LS under a LS cutoff frequency ( $f_{c\text{lung}}$ ) of 60Hz. Extensive experimental results demonstrate the effectiveness and robustness of PENN, offering a promising solution to improve the accuracy of auscultation.

ICRA Conference 2025 Conference Paper

Continuous Convolution for Automated Measurement of Sperm Flagella

  • Yufei Jin
  • Han Yang
  • Wenyuan Chen
  • Xinrui Wang
  • Yu Sun 0001
  • Zhuoran Zhang 0001

Quantifying sperm flagellar beating behavior (e. g. , beating amplitude, frequency, and wavelength) plays a crucial role in biological research, clinical diagnostics, and the design of sperm-inspired microrobots. However, existing computational methods struggle to accurately and efficiently analyze the highly dynamic, complex, and fine structures of sperm flagella, especially when portions of the flagellum become invisible due to three-dimensional out-of-focus beating. This paper proposes an automated high-throughput tool for quantitative analysis of sperm flagellar beating. The core innovation is continuous convolution (CConv), which adaptively captures the irregular, time-varying patterns of sperm flagella while ensuring continuity in segmentation outputs, even in the presence of locally invisible regions caused by out-of-focus motion. CConv can be integrated into various neural network architectures as a plug-and-play module. Extensive experiments demonstrate that integrating CConv consistently improves the accuracy and continuity of flagella segmentation across different networks. Furthermore, utilizing a curvature-based approach, we quantified key flagellar beating parameters, including length, amplitude, frequency, and wavelength. Applying the high-throughput tool on 1200 sperm revealed that sperm from fertile donors had significantly higher flagellar beating frequency than sperm from infertile patients. The proposed automated tool unlocks high-throughput, quantitative analysis of sperm flagellar beating, showing the potential for applications in reproductive biology and engineering research. The codes and datasets will be released at https://github.com/Goldfish-Yu/CConv.

NeurIPS Conference 2025 Conference Paper

E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products

  • Yunyang Li
  • Lin Huang
  • Zhihao Ding
  • Xinran Wei
  • Chu Wang
  • Han Yang
  • Zun Wang
  • Chang Liu

Equivariant Graph Neural Networks (EGNNs) have demonstrated significant success in modeling microscale systems, including those in chemistry, biology and materials science. However, EGNNs face substantial computational challenges due to the high cost of constructing edge features via spherical tensor products, making them almost impractical for large-scale systems. To address this limitation, we introduce E2Former, an equivariant and efficient transformer architecture that incorporates a Wigner $6j$ convolution (Wigner $6j$ Conv). By shifting the computational burden from edges to nodes, Wigner $6j$ Conv reduces the complexity from $O(| \mathcal{E}|)$ to $O(| \mathcal{V}|)$ while preserving both the model's expressive power and rotational equivariance. We show that this approach achieves a 7x–30x speedup compared to conventional $\mathrm{SO}(3)$ convolutions. Furthermore, our empirical results demonstrate that the derived E2Former mitigates the computational challenges of existing approaches without compromising the ability to capture detailed geometric information. This development could suggest a promising direction for scalable molecular modeling.

ICML Conference 2025 Conference Paper

Efficient and Scalable Density Functional Theory Hamiltonian Prediction through Adaptive Sparsity

  • Erpai Luo
  • Xinran Wei
  • Lin Huang
  • Yunyang Li
  • Han Yang
  • Zaishuo Xia
  • Zun Wang
  • Chang Liu 0030

Hamiltonian matrix prediction is pivotal in computational chemistry, serving as the foundation for determining a wide range of molecular properties. While SE(3) equivariant graph neural networks have achieved remarkable success in this domain, their substantial computational cost—driven by high-order tensor product (TP) operations—restricts their scalability to large molecular systems with extensive basis sets. To address this challenge, we introduce SPH Net, an efficient and scalable equivariant network, that incorporates adaptive SP arsity into H amiltonian prediction. SPHNet employs two innovative sparse gates to selectively constrain non-critical interaction combinations, significantly reducing tensor product computations while maintaining accuracy. To optimize the sparse representation, we develop a Three-phase Sparsity Scheduler, ensuring stable convergence and achieving high performance at sparsity rates of up to 70%. Extensive evaluations on QH9 and PubchemQH datasets demonstrate that SPHNet achieves state-of-the-art accuracy while providing up to a 7x speedup over existing models. Beyond Hamiltonian prediction, the proposed sparsification techniques also hold significant potential for improving the efficiency and scalability of other SE(3) equivariant networks, further broadening their applicability and impact.

ICLR Conference 2025 Conference Paper

Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems

  • Yunyang Li
  • Zaishuo Xia
  • Lin Huang
  • Xinran Wei
  • Samuel Harshe
  • Han Yang
  • Erpai Luo
  • Zun Wang

Density Functional Theory (DFT) is a pivotal method within quantum chemistry and materials science, with its core involving the construction and solution of the Kohn-Sham Hamiltonian. Despite its importance, the application of DFT is frequently limited by the substantial computational resources required to construct the Kohn-Sham Hamiltonian. In response to these limitations, current research has employed deep-learning models to efficiently predict molecular and solid Hamiltonians, with roto-translational symmetries encoded in their neural networks. However, the scalability of prior models may be problematic when applied to large molecules, resulting in non-physical predictions of ground-state properties. In this study, we generate a substantially larger training set (PubChemQH) than used previously and use it to create a scalable model for DFT calculations with physical accuracy. For our model, we introduce a loss function derived from physical principles, which we call Wavefunction Alignment Loss (WALoss). WALoss involves performing a basis change on the predicted Hamiltonian to align it with the observed one; thus, the resulting differences can serve as a surrogate for orbital energy differences, allowing models to make better predictions for molecular orbitals and total energies than previously possible. WALoss also substantially accelerates self-consistent-field (SCF) DFT calculations. Here, we show it achieves a reduction in total energy prediction error by a factor of 1347 and an SCF calculation speed-up by a factor of 18\%. These substantial improvements set new benchmarks for achieving accurate and applicable predictions in larger molecular systems.

AAAI Conference 2025 Conference Paper

HSRDiff: A Hierarchical Self-Regulation Diffusion Model for Stochastic Semantic Segmentation

  • Han Yang
  • Chuanguang Yang
  • Zhulin An
  • Libo Huang
  • Yongjun Xu

In safety-critical domains such as medical diagnostics and autonomous driving, single-image evidence is sometimes insufficient to reflect the inherent ambiguity of vision problems. Therefore, multiple plausible assumptions that match the image semantics may be needed to reflect the actual distribution of targets and support downstream tasks. However, balancing and improving the diversity and consistency of segmentation predictions under the high-dimensional output spaces and potential multimodal distributions is still challenging. This paper presents Hierarchical Self-Regulation Diffusion (HSRDiff), a unified framework that simulates joint probability distribution over entire labels. Our model self-regulates the balance between the two modes of predicting the label and noise in a novel ``differentiation to unification" pipeline and dynamically fits the optimal path to model the aleatoric uncertainty rooted in observations. In addition, we preserve the high-fidelity reconstruction of the delicate structure in images by leveraging the hierarchical multi-scale condition priors. We validate HSRDiff in three different semantic scenarios. Experimental results show that HSRDiff is superior to the comparison method with a considerable performance gap.

AAAI Conference 2025 Conference Paper

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

  • Chuanguang Yang
  • XinQiang Yu
  • Han Yang
  • Zhulin An
  • Chengqing Yu
  • Libo Huang
  • Yongjun Xu

Multi-teacher Knowledge Distillation (KD) transfers diverse knowledge from a teacher pool to a student network. The core problem of multi-teacher KD is how to balance distillation strengths among various teachers. Most existing methods often develop weighting strategies from an individual perspective of teacher performance or teacher-student gaps, lacking comprehensive information for guidance. This paper proposes Multi-Teacher Knowledge Distillation with Reinforcement Learning (MTKD-RL) to optimize multi-teacher weights. In this framework, we construct both teacher performance and teacher-student gaps as state information to an agent. The agent outputs the teacher weight and can be updated by the return reward from the student. MTKD-RL reinforces the interaction between the student and teacher using an agent in an RL-based decision mechanism, achieving better matching capability with more meaningful weights. Experimental results on visual recognition tasks, including image classification, object detection, and semantic segmentation tasks, demonstrate that MTKD-RL achieves state-of-the-art performance compared to the existing multi-teacher KD works.

ICLR Conference 2025 Conference Paper

Prototype antithesis for biological few-shot class-incremental learning

  • Binghao Liu
  • Han Yang
  • Fang Wan
  • Fei Gu

Deep learning has become essential in the biological species recognition task. However, a significant challenge is the ability to continuously learn new or mutated species with limited annotated samples. Since species within the same family typically share similar traits, distinguishing between new and existing (old) species during incremental learning often faces the issue of species confusion. This can result in "catastrophic forgetting" of old species and poor learning of new ones. To address this issue, we propose a Prototype Antithesis (PA) method, which leverages the hierarchical structures in biological taxa to reduce confusion between new and old species. PA operates in two steps: Residual Prototype Learning (RPL) and Residual Prototype Mixing (RPM). RPL enables the model to learn unique prototypes for each species alongside residual prototypes representing shared traits within families. RPM generates synthetic samples by blending features of new species with residual prototypes of old species, encouraging the model to focus on species-unique traits and minimize species confusion. By integrating RPL and RPM, the proposed PA method mitigates "catastrophic forgetting" while improving generalization to new species. Extensive experiments on CUB200, PlantVillage, and Tree-of-Life datasets demonstrate that PA significantly reduces inter-species confusion and achieves state-of-the-art performance, highlighting its potential for deep learning in biological data analysis.

AAAI Conference 2025 Conference Paper

UniMuMo: Unified Text, Music, and Motion Generation

  • Han Yang
  • Kun Su
  • Yutong Zhang
  • Jiaben Chen
  • Kaizhi Qian
  • Gaowen Liu
  • Chuang Gan

We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities. To address the lack of time-synchronized data, we align unpaired music and motion data based on rhythmic patterns to leverage existing large-scale music-only and motion-only datasets. By converting music, motion, and text into token-based representation, our model bridges these modalities through a unified encoder-decoder transformer architecture. To support multiple generation tasks within a single framework, we introduce several architectural improvements. We propose encoding motion with a music codebook, mapping motion into the same feature space as music. We introduce a music-motion parallel generation scheme that unifies all music and motion generation tasks into a single transformer decoder architecture with a single training task of music-motion joint generation. Moreover, the model is designed by fine-tuning existing pre-trained single-modality models, significantly reducing computational demands. Extensive experiments demonstrate that UniMuMo achieves competitive results on all unidirectional generation benchmarks across music, motion, and text modalities.

ICLR Conference 2024 Conference Paper

GNNCert: Deterministic Certification of Graph Neural Networks against Adversarial Perturbations

  • Zaishuo Xia
  • Han Yang
  • Binghui Wang
  • Jinyuan Jia 0001

Graph classification, which aims to predict a label for a graph, has many real-world applications such as malware detection, fraud detection, and healthcare. However, many studies show an attacker could carefully perturb the structure and/or node features in a graph such that a graph classifier misclassifies the perturbed graph. Such vulnerability impedes the deployment of graph classification in security/safety-critical applications. Existing empirical defenses lack formal robustness guarantees and could be broken by adaptive or unknown attacks. Existing provable defenses have the following limitations: 1) they achieve sub-optimal robustness guarantees for graph structure perturbation, 2) they cannot provide robustness guarantees for arbitrarily node feature perturbations, 3) their robustness guarantees are probabilistic, meaning they could be incorrect with a non-zero probability, and 4) they incur large computation costs. We aim to address those limitations in this work. We propose GNNCert, a certified defense against both graph structure and node feature perturbations for graph classification. Our GNNCert provably predicts the same label for a graph when the number of perturbed edges and the number of nodes with perturbed features are bounded. Our results on 8 benchmark datasets show that GNNCert outperforms three state-of-the-art methods.

ICLR Conference 2024 Conference Paper

Long-Short-Range Message-Passing: A Physics-Informed Framework to Capture Non-Local Interaction for Scalable Molecular Dynamics Simulation

  • Yunyang Li
  • Yusong Wang
  • Lin Huang
  • Han Yang
  • Xinran Wei
  • Jia Zhang 0004
  • Tong Wang 0014
  • Zun Wang

Computational simulation of chemical and biological systems using *ab initio* molecular dynamics has been a challenge over decades. Researchers have attempted to address the problem with machine learning and fragmentation-based methods. However, the two approaches fail to give a satisfactory description of long-range and many-body interactions, respectively. Inspired by fragmentation-based methods, we propose the Long-Short-Range Message-Passing (LSR-MP) framework as a generalization of the existing equivariant graph neural networks (EGNNs) with the intent to incorporate long-range interactions efficiently and effectively. We apply the LSR-MP framework to the recently proposed ViSNet and demonstrate the state-of-the-art results with up to 40% MAE reduction for molecules in MD22 and Chignolin datasets. Consistent improvements to various EGNNs will also be discussed to illustrate the general applicability and robustness of our LSR-MP framework. The code for our experiments and trained model weights could be found at https://github.com/liyy2/LSR-MP.

ICLR Conference 2024 Conference Paper

Training-free Multi-objective Diffusion Model for 3D Molecule Generation

  • Xu Han
  • Caihua Shan
  • Yifei Shen 0004
  • Can Xu
  • Han Yang
  • Xiang Li 0067
  • Dongsheng Li 0002

Searching for novel and diverse molecular candidates is a critical undertaking in drug and material discovery. Existing approaches have successfully adapted the diffusion model, the most effective generative model in image generation, to create 1D SMILES strings, 2D chemical graphs, or 3D molecular conformers. However, these methods are not efficient and flexible enough to generate 3D molecules with multiple desired properties, as they require additional training for the models for each new property or even a new combination of existing properties. Moreover, some properties may potentially conflict, making it impossible to find a molecule that satisfies all of them simultaneously. To address these challenges, we present a training-free conditional 3D molecular generation algorithm based on off-the-shelf unconditional diffusion models and property prediction models. The key techniques include modeling the loss of property prediction models as energy functions, considering the property relation between multiple conditions as a probabilistic graph, and developing a stable posterior estimation for computing the conditional score function. We conducted experiments on both single-objective and multi-objective 3D molecule generation, focusing on quantum properties, and compared our approach with the trained or fine-tuned diffusion models. Our proposed model achieves superior performance in generating molecules that meet the conditions, without any additional training cost.

ICRA Conference 2024 Conference Paper

Weakly-Supervised Depth Completion during Robotic Micromanipulation from a Monocular Microscopic Image

  • Han Yang
  • Yufei Jin
  • Guanqiao Shan
  • Yibin Wang
  • Yongbin Zheng
  • Jiangfan Yu
  • Yu Sun 0001
  • Zhuoran Zhang 0001

Obtaining three-dimensional information, especially the z-axis depth information, is crucial for robotic micromanipulation. Due to the unavailability of depth sensors such as lidars in micromanipulation setups, traditional depth acquisition methods such as depth from focus or depth from defocus directly infer depth from microscopic images and suffer from poor resolution. Alternatively, micromanipulation tasks obtain accurate depth information by detecting the contact between an end-effector and an object (e. g. , a cell). Despite its high accuracy, only sparse depth data can be obtained due to its low efficiency. This paper aims to address the challenge of acquiring dense depth information during robotic cell micromanipulation. A weakly-supervised depth completion network is proposed to take cell images and sparse depth data obtained by contact detection as input to generate a dense depth map. A two-stage data augmentation method is proposed to augment the sparse depth data, and the depth map is optimized by a network refinement method. The experimental results show that the MAE value of the depth prediction error is less than 0. 3 µm, which proves the accuracy and effectiveness of the method. This deep learning network pipeline can be seamlessly integrated with the robotic micromanipulation tasks to provide accurate depth information.

TMLR Journal 2023 Journal Article

Calibrating and Improving Graph Contrastive Learning

  • MA KAILI
  • Garry YANG
  • Han Yang
  • Yongqiang Chen
  • James Cheng

Graph contrastive learning algorithms have demonstrated remarkable success in various applications such as node classification, link prediction, and graph clustering. However, in unsupervised graph contrastive learning, some contrastive pairs may contradict the truths in downstream tasks and thus the decrease of losses on these pairs undesirably harms the performance in the downstream tasks. To assess the discrepancy between the prediction and the ground-truth in the downstream tasks for these contrastive pairs, we adapt expected calibration error (ECE) to graph contrastive learning. The analysis of ECE motivates us to propose a novel regularization method, Contrast-Reg, to ensure that decreasing the contrastive loss leads to better performance in the downstream tasks. As a plug-in regularizer, Contrast-Reg effectively improves the performance of existing graph contrastive learning algorithms. We provide both theoretical and empirical results to demonstrate the effectiveness of Contrast-Reg in enhancing the generalizability of the Graph Neural Network (GNN) model and improving the performance of graph contrastive algorithms with different similarity definitions and encoder backbones across various downstream tasks.

NeurIPS Conference 2022 Conference Paper

Exact Shape Correspondence via 2D graph convolution

  • Barakeel Fanseu Kamhoua
  • Lin Zhang
  • Yongqiang Chen
  • Han Yang
  • MA KAILI
  • Bo Han
  • Bo Li
  • James Cheng

For exact 3D shape correspondence (matching or alignment), i. e. , the task of matching each point on a shape to its exact corresponding point on the other shape (or to be more specific, matching at geodesic error 0), most existing methods do not perform well due to two main problems. First, on nearly-isometric shapes (i. e. , low noise levels), most existing methods use the eigen-vectors (eigen-functions) of the Laplace Beltrami Operator (LBO) or other shape descriptors to update an initialized correspondence which is not exact, leading to an accumulation of update errors. Thus, though the final correspondence may generally be smooth, it is generally inexact. Second, on non-isometric shapes (noisy shapes), existing methods are generally not robust to noise as they usually assume near-isometry. In addition, existing methods that attempt to address the non-isometric shape problem (e. g. , GRAMPA) are generally computationally expensive and do not generalise to nearly-isometric shapes. To address these two problems, we propose a 2D graph convolution-based framework called 2D-GEM. 2D-GEM is robust to noise on non-isometric shapes and with a few additional constraints, it also addresses the errors in the update on nearly-isometric shapes. We demonstrate the effectiveness of 2D-GEM by achieving a high accuracy of 90. 5$\%$ at geodesic error 0 on the non-isometric benchmark SHREC16, i. e. , TOPKIDS (while being much faster than GRAMPA), and on nearly-isometric benchmarks by achieving a high accuracy of 92. 5$\%$ on TOSCA and 84. 9$\%$ on SCAPE at geodesic error 0.

NeurIPS Conference 2022 Conference Paper

Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs

  • Yongqiang Chen
  • Yonggang Zhang
  • Yatao Bian
  • Han Yang
  • MA KAILI
  • Binghui Xie
  • Tongliang Liu
  • Bo Han

Despite recent success in using the invariance principle for out-of-distribution (OOD) generalization on Euclidean data (e. g. , images), studies on graph data are still limited. Different from images, the complex nature of graphs poses unique challenges to adopting the invariance principle. In particular, distribution shifts on graphs can appear in a variety of forms such as attributes and structures, making it difficult to identify the invariance. Moreover, domain or environment partitions, which are often required by OOD methods on Euclidean data, could be highly expensive to obtain for graphs. To bridge this gap, we propose a new framework, called Causality Inspired Invariant Graph LeArning (CIGA), to capture the invariance of graphs for guaranteed OOD generalization under various distribution shifts. Specifically, we characterize potential distribution shifts on graphs with causal models, concluding that OOD generalization on graphs is achievable when models focus only on subgraphs containing the most information about the causes of labels. Accordingly, we propose an information-theoretic objective to extract the desired subgraphs that maximally preserve the invariant intra-class information. Learning with these subgraphs is immune to distribution shifts. Extensive experiments on 16 synthetic or real-world datasets, including a challenging setting -- DrugOOD, from AI-aided drug discovery, validate the superior OOD performance of CIGA.

AAAI Conference 2021 Conference Paper

Rethinking Graph Regularization for Graph Neural Networks

  • Han Yang
  • Kaili Ma
  • James Cheng

The graph Laplacian regularization term is usually used in semi-supervised representation learning to provide graph structure information for a model f(X). However, with the recent popularity of graph neural networks (GNNs), directly encoding graph structure A into a model, i. e. , f(A, X), has become the more common approach. While we show that graph Laplacian regularization brings little-to-no benefit to existing GNNs, and propose a simple but non-trivial variant of graph Laplacian regularization, called Propagation-regularization (P-reg), to boost the performance of existing GNN models. We provide formal analyses to show that P-reg not only infuses extra information (that is not captured by the traditional graph Laplacian regularization) into GNNs, but also has the capacity equivalent to an infinite-depth graph convolutional network. We demonstrate that P-reg can effectively boost the performance of existing GNN models on both node-level and graph-level tasks across many different datasets.