课外学习：知识转移超出经验分布

论文标题

课外学习：知识转移超出经验分布

Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

论文作者

Pouransari, Hadi, Javaheripi, Mojan, Sharma, Vinay, Tuzel, Oncel

论文摘要

知识蒸馏已被用来将复杂模型（教师）学到的知识转移到更简单的模型（学生）。该技术被广泛用于压缩模型的复杂性。但是，在大多数应用中，压缩的学生模型与老师的准确差距遇到了差距。我们提出了课外学习，一种新颖的知识蒸馏方法，它通过（1）对学生和教师的产出分布进行建模来弥合这一差距；（2）将示例从近似值到基础数据分布的采样；（3）匹配此扩展集的学生和老师的产出分布，包括不确定的样本。我们对回归和分类任务进行了严格的评估，并表明与标准知识蒸馏相比，课外学习将差距降低了46％至68％。与针对各种最近的神经网络架构的经验风险最小化培训相比，这会导致重大准确的提高：MPIigaze数据集的回归误差减少16％， +3.4％至 +9.1％的CIFAR100数据集的TOP-1分类准确性提高了，以及 +2.9％的TOP-1提高了ImagEnet DataSet的TOPENTEMENT。

Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed student model suffers from an accuracy gap with its teacher. We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions; (2) sampling examples from an approximation to the underlying data distribution; and (3) matching student and teacher output distributions over this extended set including uncertain samples. We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46% to 68%. This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures: 16% regression error reduction on the MPIIGaze dataset, +3.4% to +9.1% improvement in top-1 classification accuracy on the CIFAR100 dataset, and +2.9% top-1 improvement on the ImageNet dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题