Arrow Research search
Back to AAAI

AAAI 2025

Adaptive Dual Guidance Knowledge Distillation

Conference Paper AAAI Technical Track on Machine Learning III Artificial Intelligence

Abstract

Knowledge distillation (KD) aims to improve the performance of lightweight student networks under the guidance of pre-trained teachers. However, the large capacity gap between teachers and students limits the distillation gains. Previous methods addressing this problem have two weaknesses. First, most of them decrease the performance of pre-trained teachers, hindering students from achieving comparable performance. Second, these methods fail to dynamically adjust the transferred knowledge to be compatible with the representation ability of students, which is less effective in bridging the capacity gap. In this paper, we propose Adaptive Dual Guidance Knowledge Distillation (ADG-KD), which retains the guidance of the pre-trained teacher and uses the teacher's bidirectional optimization route guiding the student to alleviate the capacity gap problem. Specifically, ADG-KD introduces an initialized teacher, which has an identical structure to the pre-trained teacher and is optimized through the bidirectional supervision from both the pre-trained teacher and student. In this way, we construct the teacher's bidirectional optimization route to provide the students with an easy-to-hard and compatible knowledge sequence. ADG-KD trains the students under the proposed dual guidance approaches and automatically determines their importance weights, making the transferred knowledge better compatible with the representation ability of students. Extensive experiments on CIFAR-100, ImageNet, and MS-COCO demonstrate the effectiveness of our method.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
AAAI Conference on Artificial Intelligence
Archive span
1980-2026
Indexed papers
28718
Paper id
161352343186045543