论文标题
多教老师的知识蒸馏,用于增量隐式分类
Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification
论文作者
论文摘要
增量学习方法可以通过将知识从最后一个模型(作为教师模型)提炼到依次学习过程中的当前模型(作为学生模型)来连续学习新课程。但是,这些方法不能用于逐步隐式精制分类(IIRC),这是一种增量学习扩展,其中传入类可能具有两个粒度水平,一个超类标签和一个子类标签。这是因为先前学到的超类知识可能被依次学习的子类知识所占据。为了解决这个问题,我们提出了一种新颖的多教老师知识蒸馏(MTKD)策略。为了保留子类知识,我们将最后一个模型用作普通教师来提炼学生模型的先前知识。为了保留超级类知识,我们将初始模型用作超级类老师,因为初始模型包含丰富的超类知识,因此将超类知识提炼出来。但是,从两个教师模型中提取知识可能会导致学生模型做出一些冗余的预测。我们进一步提出了一种后处理机制,称为TOP-K预测限制,以减少冗余预测。我们在IIRC-IMAGENET120和IIRC-CIFAR100上的实验结果表明,与现有最新方法相比,所提出的方法可以实现更好的分类精度。
Incremental learning methods can learn new classes continually by distilling knowledge from the last model (as a teacher model) to the current model (as a student model) in the sequentially learning process. However, these methods cannot work for Incremental Implicitly-Refined Classification (IIRC), an incremental learning extension where the incoming classes could have two granularity levels, a superclass label and a subclass label. This is because the previously learned superclass knowledge may be occupied by the subclass knowledge learned sequentially. To solve this problem, we propose a novel Multi-Teacher Knowledge Distillation (MTKD) strategy. To preserve the subclass knowledge, we use the last model as a general teacher to distill the previous knowledge for the student model. To preserve the superclass knowledge, we use the initial model as a superclass teacher to distill the superclass knowledge as the initial model contains abundant superclass knowledge. However, distilling knowledge from two teacher models could result in the student model making some redundant predictions. We further propose a post-processing mechanism, called as Top-k prediction restriction to reduce the redundant predictions. Our experimental results on IIRC-ImageNet120 and IIRC-CIFAR100 show that the proposed method can achieve better classification accuracy compared with existing state-of-the-art methods.