论文标题
对象检测的结构知识蒸馏
Structural Knowledge Distillation for Object Detection
论文作者
论文摘要
知识蒸馏(KD)是深层神经网络中众所周知的培训范式,其中大型教师模型获得的知识被转移到小型学生。事实证明,KD是一种有效的技术,可以显着提高学生在包括对象检测在内的各种任务的绩效。因此,KD技术主要依赖于中间功能水平的指导,通常通过在培训期间最大程度地减少教师和学生激活之间的LP-norm距离来实现。在本文中,我们提出了基于结构相似性(SSIM)的替代像素独立的LP-norm。通过考虑到额外的对比度和结构提示,在损失公式中考虑了特征空间中的特征重要性,相关性和空间依赖性。关于MSCOCO的广泛实验证明了我们在不同训练方案和体系结构中方法的有效性。我们的方法仅增加了很少的计算开销,可以简单地实现,同时它显着优于标准LP-Norms。此外,使用基于注意的采样机制的更为复杂的最新KD方法的表现要优越,包括使用Vanilla模型更快的R-CNN R-50使用+3.5 AP增益。
Knowledge Distillation (KD) is a well-known training paradigm in deep neural networks where knowledge acquired by a large teacher model is transferred to a small student. KD has proven to be an effective technique to significantly improve the student's performance for various tasks including object detection. As such, KD techniques mostly rely on guidance at the intermediate feature level, which is typically implemented by minimizing an lp-norm distance between teacher and student activations during training. In this paper, we propose a replacement for the pixel-wise independent lp-norm based on the structural similarity (SSIM). By taking into account additional contrast and structural cues, feature importance, correlation and spatial dependence in the feature space are considered in the loss formulation. Extensive experiments on MSCOCO demonstrate the effectiveness of our method across different training schemes and architectures. Our method adds only little computational overhead, is straightforward to implement and at the same time it significantly outperforms the standard lp-norms. Moreover, more complex state-of-the-art KD methods using attention-based sampling mechanisms are outperformed, including a +3.5 AP gain using a Faster R-CNN R-50 compared to a vanilla model.