Author name cluster

Sanglu Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

1 author row

AAAI Conference 2026 Conference Paper

Demystifying GNN-to-MLP Knowledge Transfer: Theoretical Grounding and Dual-Stream Distillation Method

Zhiyuan Yu
Mingkai Lin
Wenzhong Li
Zhangyue Yin
Shijian Xiao
Sanglu Lu

Graph Neural Networks (GNNs) have shown remarkable effectiveness across various applications, but their computational complexity poses significant scalability challenges. To this end, GNN-to-MLP Knowledge Distillation (KD) methods transfer relational inductive biases from GNNs to MLPs, equipping MLPs with graph-aware capabilities that rival or even surpass those of their teacher GNNs. However, a theoretical foundation for understanding GNN-to-MLP KD is still missing. In this paper, we provide a theoretical analysis of how knowledge distillation unlocks the potential of MLPs for graph tasks from the perspective of training dynamics. We demonstrate that label alignment in KD fundamentally reshapes the Neural Tangent Kernel (NTK) matrix of student MLPs, enabling them to learn the teacher model’s implicit graph bias. We further investigate finer-grained distillation paradigms and reveal that conventional layer-wise output alignment fails to effectively align the deep-layer graph propagation outcomes. To address this, we propose Dual-Stream Aligned MLP (DA-MLP), which incorporates complementary graph filters in a dual-stream architecture. This approach simultaneously enhances feature space dimensionality for improved representation alignment and preserves graph signals across different frequency bands. Comprehensive experiments on seven benchmark datasets validate that DA-MLP can be seamlessly integrated into existing knowledge distillation frameworks for performance enhancements in both transductive and inductive settings.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Aggregation Mechanism Based Graph Heterogeneous Networks Distillation

Xiaobin Hong
Mingkai Lin
Xiangkai Ma
Wenzhong Li
Sanglu Lu

Graph Neural Networks (GNNs) have demonstrated remarkable effectiveness across various tasks but are often hindered by their high computational overhead. GNN-to-MLP distillation provides a promising remedy by transferring knowledge from complex GNNs to lightweight MLPs. However, existing methods largely overlook the differences in aggregation mechanisms and heterogeneous architectures. Simplifying such intricate information into MLP potentially causes information loss or distortion, ultimately resulting in suboptimal performance. This paper proposes an aggregation mechanism enhanced GNN distillation framework (AMEND). AMEND introduces multi-scope aggregation context preservation to replicate the teacher's broad aggregation scopes and an aggregation-enhanced centered kernel alignment method to match the teacher's aggregation patterns. To ensure efficient and robust knowledge transfer, we integrate a manifold mixup strategy, enabling the student to capture the teacher's insights into mixed data distributions. Experimental results on 8 standard and 4 large-scale datasets demonstrate that AMEND consistently outperforms state-of-the-art distillation methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Contextual Structure Knowledge Transfer for Graph Neural Networks

Zhiyuan Yu
Wenzhong Li
Zhangyue Yin
Xiaobin Hong
Shijian Xiao
Sanglu Lu

Graph transfer learning endeavors to develop a Graph Neural Network (GNN) model in a fully-labeled source domain, with the intention of deploying it on a target domain that has limited labeled data for inference. We reveal that prevalent graph transfer learning methods are susceptible to the homophily shift problem. This issue arises from the divergence in homophily structures between the source and target graphs, leading to a notable deterioration in the performance of GNN models. In this paper, we introduce a novel Contextual Structural Graph Neural Network (CS-GNN) method, leveraging a tailored attention mechanism to apprehend a variety of local structural cues, facilitating structural knowledge transfer across domains. It features an ego-network module to distill local structural diversity and a moment-based approach to gauge structural patterns without needing ground-truth labels. CS-GNN crafts a feature smoothness matrix from node attributes, guiding a customized attention mechanism for feature aggregation. A group-wise fairness loss is employed to balance learning across various structural patterns, enhancing the model's ability to transfer knowledge across domains. Comprehensive experiments conducted on six benchmark datasets substantiate the superiority of CS-GNN over the state-of-the-art methods, demonstrating significant improvements in accuracy and robustness against homophily shifts.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

MixSignGraph: A Sign Sequence is Worth Mixed Graphs of Nodes

Shiwei Gan
Yafeng Yin
Zhiwei Jiang
Lei Xie
Sanglu Lu
Hongkai Wen

Recent advances in sign language research have benefited from CNN-based backbones, which are primarily transferred from traditional computer vision tasks (\eg object detection, image recognition). However, these CNN-based backbones usually excel at extracting features like contours and texture, but may struggle with capturing sign-related features. To capture such sign-related features, SignGraph model extracts the cross-region sign features by building the Local Sign Graph (LSG) module and the Temporal Sign Graph (TSG) module. However, we emphasize that although capturing cross-region dependencies can improve sign language performance, it may degrade the representation quality of local regions. To mitigate this, we introduce MixSignGraph, which represents sign sequences as a group of mixed graphs for feature extraction. Specifically, besides the LSG module and TSG module that model the intra-frame and inter-frame cross-regions features, we design a simple yet effective Hierarchical Sign Graph (HSG) module, which enhances local region representations following the extraction of cross-region features, by aggregating the same-region features from different-granularity feature maps of a frame, \ie to boost discriminative local features. In addition, to further improve the performance of gloss-free sign language task, we propose a simple yet counter-intuitive Text-based CTC Pre-training (TCTC) method, which generates pseudo gloss labels from text sequences for model pre-training. Extensive experiments conducted on the current five sign language datasets demonstrate that MixSignGraph surpasses the most current models on multiple sign language tasks across several datasets, without relying on any additional cues. Code and models are available at: \href{https: //github. com/gswycf/SignLanguage}{\textcolor{blue}{https: //github. com/gswycf/SignLanguage}}.

PDF Details

IJCAI Conference 2025 Conference Paper

QuantileFormer: Probabilistic Time Series Forecasting with a Pattern-Mixture Decomposed VAE Transformer

Yimiao Shao
Wenzhong Li
Kang Xia
Kaijie Lin
Mingkai Lin
Sanglu Lu

Probabilistic time series forecasting has attracted an increasing attention in machine learning community for its potential applications in the fields of renewable energy, traffic management, healthcare, etc. Previous research mainly focused on extracting long-range dependencies for point-wise prediction, which fail to capture complex temporal patterns and statistical characteristics for probabilistic analysis. In this paper, we propose a novel pattern-mixture decomposition method that decomposes long-term series into quantile drift, divergence patterns, and Gaussian mixture components, which can effectively capture the intricate temporal patterns and stochastic characteristics in time series. Based on pattern-mixture decomposition, we propose a novel Transformer-based model called QuantileFormer for probabilistic time series forecasting. It takes the the comprehensive drift-divergence mixture patterns as features, and designs a variational inference based fusion Transformer architecture to generate quantile prediction results. Extensive experiments show that the proposed method consistently boosts the baseline methods by a large margin and achieves state-of-the-art performance on six real-world benchmarks.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Unified Graph Neural Networks Pre-training for Multi-domain Graphs

Mingkai Lin
Xiaobin Hong
Wenzhong Li
Sanglu Lu

Graph Neural Networks (GNNs) have proven effective and typically benefit from pre-training on accessible graphs to enhance performance on tasks with limited labeled data. However, existing GNNs are constrained by the ``one-domain-one-model'' limitation, which restricts their effectiveness across diverse graph domains. In this paper, we tackle this problem by developing a method called Multi-Domain Pre-training for a Unified GNN Model (MDP-GNN). This method is based on the philosophical notion that everything is interconnected, suggesting that a latent meta-domain exists to encompass the diverse graph domains and their interconnections. MDP-GNN seeks to identify and utilize this meta-domain to train a unified GNN model through three core strategies. Firstly, it integrates node feature semantics from different domains to create unified representations. Secondly, it employs a bi-level learning strategy to build a domain-synthesized network that identifies latent connections to facilitate cross-domain knowledge transfer. Thirdly, it uses Wasserstein distance to map diverse domains into the common meta-domain for graph distribution alignment. We validate the effectiveness of MDP-GNN through theoretical analysis and extensive experiments on four real-world graph datasets, showing its superiority in enhancing GNN performance across diverse domains.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Label Attentive Distillation for GNN-Based Graph Classification

Xiaobin Hong
Wenzhong Li
Chaoqun Wang
Mingkai Lin
Sanglu Lu

Graph Neural Networks (GNNs) have emerged as a powerful tool for modeling graph-structured data, exhibiting remarkable potential in applications such as social networks, recommendation systems, and molecular structures. However, the conventional GNNs perform node-level feature aggregation from neighbors without considering graph-label information, which leads to the misaligned embedding problem that may cause a detrimental effect on graph-level tasks such as graph classification. In this paper, we propose a novel label-attentive distillation method called LAD-GNN for graph representation learning to solve this problem. It alternatively trains a teacher model and a student GNN with a distillation-based approach. In the teacher model, a label-attentive encoder is proposed to encode the label information fusing with the node features to generate ideal embedding. In the student model, the ideal embedding is used as intermediate supervision to urge the student GNN to learn class-friendly node embedding to facilitate graph-level tasks. Generally, LAD-GNN is an enhanced GNN training approach that can be incorporated with arbitrary GNN backbone to improve performance without significant increase of computational cost. Extensive experiments with 7 GNN backbones based on 10 benchmark datasets show that LAD-GNN improves the SOTA GNNs in graph classification accuracy. The source codes of LAD-GNN are publicly available on https://github.com/XiaobinHong/LAD-GNN.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Contrastive Learning for Sign Language Recognition and Translation

Shiwei Gan
Yafeng Yin
Zhiwei Jiang
Kang Xia
Lei Xie
Sanglu Lu

There are two problems that widely exist in current end-to-end sign language processing architecture. One is the CTC spike phenomenon which weakens the visual representational ability in Continuous Sign Language Recognition (CSLR). The other one is the exposure bias problem which leads to the accumulation of translation errors during inference in Sign Language Translation (SLT). In this paper, we tackle these issues by introducing contrast learning, aiming to enhance both visual-level feature representation and semantic-level error tolerance. Specifically, to alleviate CTC spike phenomenon and enhance visual-level representation, we design a visual contrastive loss by minimizing visual feature distance between different augmented samples of frames in one sign video, so that the model can further explore features by utilizing numerous unlabeled frames in an unsupervised way. To alleviate exposure bias problem and improve semantic-level error tolerance, we design a semantic contrastive loss by re-inputting the predicted sentence into semantic module and comparing features of ground-truth sequence and predicted sequence, for exposing model to its own mistakes. Besides, we propose two new metrics, i. e. , Blank Rate and Consecutive Wrong Word Rate to directly reflect our improvement on the two problems. Extensive experimental results on current sign language datasets demonstrate the effectiveness of our approach, which achieves state-of-the-art performance.

PDF Details DOI

JAIR Journal 2023 Journal Article

Fair Influence Maximization in Large-scale Social Networks Based on Attribute-aware Reverse Influence Sampling

Mingkai Lin
Lintan Sun
Rui Yang
Xusheng Liu
Yajuan Wang
Ding Li
Wenzhong Li
Sanglu Lu

Influence maximization is the problem of finding a set of seed nodes in the network that maximizes the influence spread, which has become an important topic in social network analysis. Conventional influence maximization algorithms cause “unfair" influence spread among different groups in the population, which could lead to severe bias in public opinion dissemination and viral marketing. To address this issue, we formulate the fair influence maximization problem concerning the trade-off between influence maximization and group fairness. For the purpose of solving the fair influence maximization problem in large-scale social networks efficiently, we propose a novel attribute-based reverse influence sampling (ABRIS) framework. This framework intends to estimate influence in specific groups with guarantee through an attribute-based hypergraph so that we can select seed nodes strategically. Therefore, under the ABRIS framework, we design two different node selection algorithms, ABRIS-G and ABRIS-T. ABRIS-G selects nodes in a greedy scheduling way. ABRIS-T adopts a two-phase node selection method. These algorithms run efficiently and achieve a good trade-off between influence maximization and group fairness. Extensive experiments on six real-world social networks show that our algorithms significantly outperform the state-of-the-art approaches. This article appears in the AI & Society track.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Multi-Domain Generalized Graph Meta Learning

Mingkai Lin
Wenzhong Li
Ding Li
Yizhou Chen
Guohao Li
Sanglu Lu

Graph meta learning aims to learn historical knowledge from training graph neural networks (GNNs) models and adapt it to downstream learning tasks in a target graph, which has drawn increasing attention due to its ability of knowledge transfer and fast adaptation. While existing graph meta learning approaches assume the learning tasks are from the same graph domain but lack the solution for multi-domain adaptation. In this paper, we address the multi-domain generalized graph meta learning problem, which is challenging due to non-Euclidean data, inequivalent feature spaces, and heterogeneous distributions. To this end, we propose a novel solution called MD-Gram for multi-domain graph generalization. It introduces an empirical graph generalization method that uses empirical vectors to form a unified expression of non-Euclidean graph data. Then it proposes a multi-domain graphs transformation approach to transform the learning tasks from multiple source-domain graphs with inequivalent feature spaces into a common domain, where graph meta learning is conducted to learn generalized knowledge. It further adopts a domain-specific GNN enhancement method to learn a customized GNN model to achieve fast adaptation in the unseen target domain. Extensive experiments based on four real-world graph domain datasets show that the proposed method significantly outperforms the state-of-the-art in multi-domain graph meta learning tasks.

PDF Details DOI

AAAI Conference 2023 Conference Paper

PatchNAS: Repairing DNNs in Deployment with Patched Network Architecture Search

Yuchu Fang
Wenzhong Li
Yao Zeng
Yang Zheng
Zheng Hu
Sanglu Lu

Despite being widely deployed in safety-critical applications such as autonomous driving and health care, deep neural networks (DNNs) still suffer from non-negligible reliability issues. Numerous works had reported that DNNs were vulnerable to either natural environmental noises or man-made adversarial noises. How to repair DNNs in deployment with noisy samples is a crucial topic for the robustness of neural networks. While many network repairing methods based on data argumentation and weight adjustment have been proposed, they require retraining and redeploying the whole model, which causes high overhead and is infeasible for varying faulty cases on different deployment environments. In this paper, we propose a novel network repairing framework called PatchNAS from the architecture perspective, where we freeze the pretrained DNNs and introduce a small patch network to deal with failure samples at runtime. PatchNAS introduces a novel network instrumentation method to determine the faulty stage of the network structure given the collected failure samples. A small patch network structure is searched unsupervisedly using neural architecture search (NAS) technique with data samples from deployment environment. The patch network repairs the DNNs by correcting the output feature maps of the faulty stage, which helps to maintain network performance on normal samples and enhance robustness in noisy environments. Extensive experiments based on several DNNs across 15 types of natural noises show that the proposed PatchNAS outperforms the state-of-the-arts with significant performance improvement as well as much lower deployment overhead.

PDF Details DOI

IJCAI Conference 2019 Conference Paper

AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity Recognition

HaoJie Ma
Wenzhong Li
Xiao Zhang
Songcheng Gao
Sanglu Lu

Sensor-based human activity recognition is a fundamental research problem in ubiquitous computing, which uses the rich sensing data from multimodal embedded sensors such as accelerometer and gyroscope to infer human activities. The existing activity recognition approaches either rely on domain knowledge or fail to address the spatial-temporal dependencies of the sensing signals. In this paper, we propose a novel attention-based multimodal neural network model called AttnSense for multimodal human activity recognition. AttnSense introduce the framework of combining attention mechanism with a convolutional neural network (CNN) and a Gated Recurrent Units (GRU) network to capture the dependencies of sensing signals in both spatial and temporal domains, which shows advantages in prioritized sensor selection and improves the comprehensibility. Extensive experiments based on three public datasets show that AttnSense achieves a competitive performance in activity recognition compared with several state-of-the-art methods.

PDF Details

IJCAI Conference 2018 Conference Paper

Label-Sensitive Task Grouping by Bayesian Nonparametric Approach for Multi-Task Multi-Label Learning

Xiao Zhang
Wenzhong Li
Vu Nguyen
Fuzhen Zhuang
Hui Xiong
Sanglu Lu

Multi-label learning is widely applied in many real-world applications, such as image and gene annotation. While most of the existing multi-label learning models focus on the single-task learning problem, there are always some tasks that share some commonalities, which can help each other to improve the learning performances if the knowledge in the similar tasks can be smartly shared. In this paper, we propose a LABel-sensitive TAsk Grouping framework, named LABTAG, based on Bayesian nonparametric approach for multi-task multi-label classification. The proposed framework explores the label correlations to capture feature-label patterns, and clusters similar tasks into groups with shared knowledge, which are learned jointly to produce a strengthened multi-task multi-label model. We evaluate the model performance on three public multi-task multi-label data sets, and the results show that LABTAG outperforms the compared baselines with a significant margin.

PDF Details