基于知识转移的细粒视觉分类

论文标题

基于知识转移的细粒视觉分类

Knowledge Transfer Based Fine-grained Visual Classification

论文作者

Zhang, Siqing, Du, Ruoyi, Chang, Dongliang, Ma, Zhanyu, Guo, Jun

论文摘要

细颗粒的视觉分类（FGVC）旨在区分同一类别的子类，其必不可少的解决方案是挖掘微妙和歧视的区域。卷积神经网络（CNN）采用跨熵损失（CE-loss）作为损失函数，表现出较差的性能，因为该模型只能学习最歧视的部分，而忽略了其他有意义的区域。一些现有的作品试图通过通过某些检测技术或注意力机制来挖掘更多歧视区域来解决这一问题。但是，当试图找到更多歧视区域时，他们中的大多数都会遇到背景噪声问题。在本文中，我们以知识转移学习方式解决它。多个模型经过一个培训，所有以前训练的模型都被视为教师模型，以监督当前培训的培训。具体而言，提出了正交损失（OR-loss），以鼓励网络找到各种各样有意义的地区。此外，第一款模型仅接受CE-alss训练。最后，将所有具有互补知识的模型输出结合在一起，以获得最终的预测结果。我们证明了所提出的方法的优势，并在三个流行的FGVC数据集上获得了最新的（SOTA）性能。

Fine-grained visual classification (FGVC) aims to distinguish the sub-classes of the same category and its essential solution is to mine the subtle and discriminative regions. Convolution neural networks (CNNs), which employ the cross entropy loss (CE-loss) as the loss function, show poor performance since the model can only learn the most discriminative part and ignore other meaningful regions. Some existing works try to solve this problem by mining more discriminative regions by some detection techniques or attention mechanisms. However, most of them will meet the background noise problem when trying to find more discriminative regions. In this paper, we address it in a knowledge transfer learning manner. Multiple models are trained one by one, and all previously trained models are regarded as teacher models to supervise the training of the current one. Specifically, a orthogonal loss (OR-loss) is proposed to encourage the network to find diverse and meaningful regions. In addition, the first model is trained with only CE-Loss. Finally, all models' outputs with complementary knowledge are combined together for the final prediction result. We demonstrate the superiority of the proposed method and obtain state-of-the-art (SOTA) performances on three popular FGVC datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题