R2-trans：降低冗余的细粒视觉分类

论文标题

R2-trans：降低冗余的细粒视觉分类

R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction

论文作者

Wang, Yu, Ye, Shuo, Yu, Shujian, You, Xinge

论文摘要

细颗粒的视觉分类（FGVC）旨在区分类似的子类别，其主要挑战是较大的类内多样性和微妙的阶层差异。现有的FGVC方法通常选择训练有素的模型发现的判别区域，这很容易忽略其他潜在的判别信息。另一方面，VIT中图像贴片序列之间的大规模相互作用使所得的类token包含许多冗余信息，这也可能影响FGVC的性能。在本文中，我们提出了一种新颖的FGVC方法，该方法可以同时利用环境线索中的部分但充分的歧视性信息，并在与目标有关的类似过程中压缩冗余信息。具体而言，我们的模型计算批处理中的高重量区域的比率，自适应调整掩蔽阈值，并在输入空间中获得中等的背景信息的中等提取。此外，我们还使用信息瓶颈〜（IB）方法来指导我们的网络学习功能空间中的最低表示。三个广泛使用的基准数据集的实验结果证明，与其他最先进的方法和基线模型相比，我们的方法可以实现优于性能。

Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences. Existing FGVC methods usually select discriminant regions found by a trained model, which is prone to neglect other potential discriminant information. On the other hand, the massive interactions between the sequence of image patches in ViT make the resulting class-token contain lots of redundant information, which may also impacts FGVC performance. In this paper, we present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target. Specifically, our model calculates the ratio of high-weight regions in a batch, adaptively adjusts the masking threshold and achieves moderate extraction of background information in the input space. Moreover, we also use the Information Bottleneck~(IB) approach to guide our network to learn a minimum sufficient representations in the feature space. Experimental results on three widely-used benchmark datasets verify that our approach can achieve outperforming performance than other state-of-the-art approaches and baseline models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题