教您应该教的内容：基于数据的蒸馏方法

论文标题

教您应该教的内容：基于数据的蒸馏方法

Teaching What You Should Teach: A Data-Based Distillation Method

论文作者

Shao, Shitong, Chen, Huanran, Huang, Zhen, Gong, Linrui, Wang, Shuai, Wu, Xinxiao

论文摘要

在真正的教学场景中，一位出色的老师总是教他（或她）擅长的东西，但学生不擅长。这为学生提供了弥补自己（或她）弱点的最佳帮助，并成为整体上的好弱点。在此启发下，我们将“您应该教的教学”策略介绍到一个知识蒸馏框架中，并提出了一种名为“ TST”的基于数据的蒸馏方法，该方法搜索了理想的增强样品，以帮助更有效，更合理地蒸馏。具体来说，我们设计了具有先验偏见的基于神经网络的数据增强模块，该模块可以通过学习幅度和概率来生成合适的数据样本，从而有助于找到符合教师优势的东西，但学生的劣势。通过训练数据增强模块和广义蒸馏范式依次，以出色的概括能力学习了学生模型。为了验证我们方法的有效性，我们在对象识别，检测和分割任务上进行了广泛的比较实验。 CIFAR-10，ImagEnet-1K，MS-Coco和CityScapes数据集的结果表明，我们的方法几乎可以在几乎所有教师成对上实现最先进的表现。此外，我们进行可视化研究以探索蒸馏过程需要哪些幅度和概率。

In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework, and propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses, by learning magnitudes and probabilities to generate suitable data samples. By training the data augmentation module and the generalized distillation paradigm in turn, a student model is learned with excellent generalization ability. To verify the effectiveness of our method, we conducted extensive comparative experiments on object recognition, detection, and segmentation tasks. The results on the CIFAR-10, ImageNet-1k, MS-COCO, and Cityscapes datasets demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct visualization studies to explore what magnitudes and probabilities are needed for the distillation process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题