随机精确合奏：量化深神经网络的自我知识蒸馏

论文标题

随机精确合奏：量化深神经网络的自我知识蒸馏

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

论文作者

Boo, Yoonho, Shin, Sungho, Choi, Jungwook, Sung, Wonyong

论文摘要

深度神经网络（QDNN）的量化已被积极研究以在边缘设备中部署。最近的研究采用知识蒸馏（KD）方法来提高量化网络的性能。在这项研究中，我们提出了QDNNS（SPEQ）的随机精确集合训练。 SPEQ是一种知识蒸馏培训计划；但是，通过共享学生网络的模型参数来形成教师。我们通过在向前通计算的每一层中随机更改激活的位精度来获得教师的软标签。使用这些软标签对学生模型进行了训练，以减少激活量化噪声。余弦相似性损失用于KD培训，而不是KL-Divermence。随着教师模型通过随机的比率分配不断变化，它利用了随机集合KD的效果。 SPEQ的表现优于各种任务中现有的量化培训方法，例如图像分类，提问和转移学习，而无需繁琐的教师网络。

The quantization of deep neural networks (QDNNs) has been actively studied for deployment in edge devices. Recent studies employ the knowledge distillation (KD) method to improve the performance of quantized networks. In this study, we propose stochastic precision ensemble training for QDNNs (SPEQ). SPEQ is a knowledge distillation training scheme; however, the teacher is formed by sharing the model parameters of the student network. We obtain the soft labels of the teacher by changing the bit precision of the activation stochastically at each layer of the forward-pass computation. The student model is trained with these soft labels to reduce the activation quantization noise. The cosine similarity loss is employed, instead of the KL-divergence, for KD training. As the teacher model changes continuously by random bit-precision assignment, it exploits the effect of stochastic ensemble KD. SPEQ outperforms the existing quantization training methods in various tasks, such as image classification, question-answering, and transfer learning without the need for cumbersome teacher networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题