论文标题
R2-trans:降低冗余的细粒视觉分类
R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction
论文作者
论文摘要
细颗粒的视觉分类(FGVC)旨在区分类似的子类别,其主要挑战是较大的类内多样性和微妙的阶层差异。现有的FGVC方法通常选择训练有素的模型发现的判别区域,这很容易忽略其他潜在的判别信息。另一方面,VIT中图像贴片序列之间的大规模相互作用使所得的类token包含许多冗余信息,这也可能影响FGVC的性能。在本文中,我们提出了一种新颖的FGVC方法,该方法可以同时利用环境线索中的部分但充分的歧视性信息,并在与目标有关的类似过程中压缩冗余信息。具体而言,我们的模型计算批处理中的高重量区域的比率,自适应调整掩蔽阈值,并在输入空间中获得中等的背景信息的中等提取。此外,我们还使用信息瓶颈〜(IB)方法来指导我们的网络学习功能空间中的最低表示。三个广泛使用的基准数据集的实验结果证明,与其他最先进的方法和基线模型相比,我们的方法可以实现优于性能。
Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences. Existing FGVC methods usually select discriminant regions found by a trained model, which is prone to neglect other potential discriminant information. On the other hand, the massive interactions between the sequence of image patches in ViT make the resulting class-token contain lots of redundant information, which may also impacts FGVC performance. In this paper, we present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target. Specifically, our model calculates the ratio of high-weight regions in a batch, adaptively adjusts the masking threshold and achieves moderate extraction of background information in the input space. Moreover, we also use the Information Bottleneck~(IB) approach to guide our network to learn a minimum sufficient representations in the feature space. Experimental results on three widely-used benchmark datasets verify that our approach can achieve outperforming performance than other state-of-the-art approaches and baseline models.