AAAI Conference 2026 Conference Paper
Demystifying GNN-to-MLP Knowledge Transfer: Theoretical Grounding and Dual-Stream Distillation Method
- Zhiyuan Yu
- Mingkai Lin
- Wenzhong Li
- Zhangyue Yin
- Shijian Xiao
- Sanglu Lu
Graph Neural Networks (GNNs) have shown remarkable effectiveness across various applications, but their computational complexity poses significant scalability challenges. To this end, GNN-to-MLP Knowledge Distillation (KD) methods transfer relational inductive biases from GNNs to MLPs, equipping MLPs with graph-aware capabilities that rival or even surpass those of their teacher GNNs. However, a theoretical foundation for understanding GNN-to-MLP KD is still missing. In this paper, we provide a theoretical analysis of how knowledge distillation unlocks the potential of MLPs for graph tasks from the perspective of training dynamics. We demonstrate that label alignment in KD fundamentally reshapes the Neural Tangent Kernel (NTK) matrix of student MLPs, enabling them to learn the teacher model’s implicit graph bias. We further investigate finer-grained distillation paradigms and reveal that conventional layer-wise output alignment fails to effectively align the deep-layer graph propagation outcomes. To address this, we propose Dual-Stream Aligned MLP (DA-MLP), which incorporates complementary graph filters in a dual-stream architecture. This approach simultaneously enhances feature space dimensionality for improved representation alignment and preserves graph signals across different frequency bands. Comprehensive experiments on seven benchmark datasets validate that DA-MLP can be seamlessly integrated into existing knowledge distillation frameworks for performance enhancements in both transductive and inductive settings.