AAAI Conference 2026 Conference Paper
FedCD: Towards Consolidated Distillation for Heterogeneous Federated Learning
- Yichen Li
- Hang Su
- Huifa Li
- Haolin Yang
- Xinlin Zhuang
- Haochen Xue
- Haozhao Wang
- Imran Razzak
Knowledge Distillation (KD) serves as an effective approach to addressing heterogeneity issues in Federated Learning (FL), leveraging additional datasets to align local and global models better. There are two primary distillation paradigms: feature-based distillation, which utilizes intermediate-layer features of the network, and logit-based distillation, which employs the final layer's logit outputs. However, existing studies often select distillation methods based on intuitive and empirical evidence when facing different heterogeneous settings, neglecting the intrinsic relationship between distillation paradigms and heterogeneity. This oversight may result in suboptimal federated knowledge distillation performance under heterogeneous conditions. In this paper, we propose the Consolidated Distillation for Heterogeneous Federated Learning - FedCD that balances knowledge representations from both feature-based and logit-based distillation to enhance performance. Specifically, to address the misalignment between knowledge conveyed by features and logits, we aggregate features from different layers via cross-layer attention to preserve semantic knowledge, followed by distribution modeling using Gaussian Mixture Models. This process strengthens knowledge distillation by constraining the transformation of different network layers' features under a consolidated distribution, thereby mitigating impacts from both data and model heterogeneity. Extensive experiments demonstrate that FedCD outperforms state-of-the-art methods by over 10.72% and validate the effectiveness of our approach.