通过知识蒸馏继承正规化的可解决模型

论文标题

通过知识蒸馏继承正规化的可解决模型

Solvable Model for Inheriting the Regularization through Knowledge Distillation

论文作者

Saglietti, Luca, Zdeborová, Lenka

论文摘要

近年来，通过神经网络转移学习的经验成功激发了人们对获得其核心特性的理论理解的日益兴趣。知识蒸馏使用较小的神经网络使用较大的神经网络的输出训练是转移学习的一个特别有趣的案例。在目前的工作中，我们引入了一个统计物理框架，该框架允许对浅神经网络中知识蒸馏（KD）的性质进行分析表征。将分析重点放在表现出非平凡概括差距的可解决模型上，我们研究了KD的有效性。我们能够证明，通过KD，较大的教师模型的正则化属性可以由较小的学生继承，并且屈服的概括表现与教师的最佳性紧密相关并限制了。最后，我们分析了在经过考虑的KD设置中可能出现的双重下降现象学。

In recent years the empirical success of transfer learning with neural networks has stimulated an increasing interest in obtaining a theoretical understanding of its core properties. Knowledge distillation where a smaller neural network is trained using the outputs of a larger neural network is a particularly interesting case of transfer learning. In the present work, we introduce a statistical physics framework that allows an analytic characterization of the properties of knowledge distillation (KD) in shallow neural networks. Focusing the analysis on a solvable model that exhibits a non-trivial generalization gap, we investigate the effectiveness of KD. We are able to show that, through KD, the regularization properties of the larger teacher model can be inherited by the smaller student and that the yielded generalization performance is closely linked to and limited by the optimality of the teacher. Finally, we analyze the double descent phenomenology that can arise in the considered KD setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题