Does Knowledge Distillation Really Work?

Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson

Artificial Intelligence And Data Science PDF Available Non-peer-reviewed Preprint

Does Knowledge Distillation Really Work?

Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson · Published 2021-06-10

Expertini /
Research /
Artificial Intelligence And Data Science /
Does Knowledge Distillation Really Work?

📄 Download PDF 🔖 Bookmark Paper

Abstract

Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood: there often remains a surprisingly large discrepancy between the predictive distributions of the teacher and the student, even in cases when the student has the capacity to perfectly match the teacher. We identify difficulties in optimization as a key reason for why the student is unable to match the teacher. We also show how the details of the dataset used for distillation play a role in how closely the student matches the teacher -- and that more closely matching the teacher paradoxically does not always lead to better student generalization.

Keywords

Artificial Intelligence & Data Science

📄 Full Paper Available as PDF

This paper is available as a downloadable PDF.

📄 Download PDF

Comments (0)

No comments yet. Be the first to comment.

Paper Details

Authors Samuel Stanton ,
Pavel Izmailov ,
Polina Kirichenko ,
Alexander A. Alemi ,
Andrew Gordon Wilson
Published 2021-06-10
Category Artificial Intelligence And Data Science
Status Non-peer-reviewed Preprint
Language English
Word Count 134

Does Knowledge Distillation Really Work?

Abstract

Keywords

✨ AI Plain-English Summary

Comments (0)

Related Papers

COVID-19 mortality analysis from soft-data multivariate curve regression and...

Danger Theory: The Link between AIS and IDS?

Smart access development for classifying lung disease with chest x-ray...

Quantile Estimation of A general Single-Index Model