Adaptive Dual Guidance Knowledge Distillation

Tong Li; Long Liu; Kang Liu; Xin Wang; Bo Zhou; Hongguang Yang; Kai Lu

doi:10.1609/aaai.v39i17.34031

Back to AAAI

AAAI 2025

Adaptive Dual Guidance Knowledge Distillation

Conference Paper AAAI Technical Track on Machine Learning III Artificial Intelligence

PDF Details DOI

Abstract

Knowledge distillation (KD) aims to improve the performance of lightweight student networks under the guidance of pre-trained teachers. However, the large capacity gap between teachers and students limits the distillation gains. Previous methods addressing this problem have two weaknesses. First, most of them decrease the performance of pre-trained teachers, hindering students from achieving comparable performance. Second, these methods fail to dynamically adjust the transferred knowledge to be compatible with the representation ability of students, which is less effective in bridging the capacity gap. In this paper, we propose Adaptive Dual Guidance Knowledge Distillation (ADG-KD), which retains the guidance of the pre-trained teacher and uses the teacher's bidirectional optimization route guiding the student to alleviate the capacity gap problem. Specifically, ADG-KD introduces an initialized teacher, which has an identical structure to the pre-trained teacher and is optimized through the bidirectional supervision from both the pre-trained teacher and student. In this way, we construct the teacher's bidirectional optimization route to provide the students with an easy-to-hard and compatible knowledge sequence. ADG-KD trains the students under the proposed dual guidance approaches and automatically determines their importance weights, making the transferred knowledge better compatible with the representation ability of students. Extensive experiments on CIFAR-100, ImageNet, and MS-COCO demonstrate the effectiveness of our method.

Adaptive Dual Guidance Knowledge Distillation

Abstract

Authors

Keywords

Context