论文标题
通过多频邻域和双交叉调制,很少发射细颗粒的图像分类
Few-shot Fine-grained Image Classification via Multi-Frequency Neighborhood and Double-cross Modulation
论文作者
论文摘要
传统的细粒图像分类通常依靠带有注释的地面真相的大规模训练样本。但是,某些子类别在实际应用中几乎没有可用的样本,而当前的几个射击模型仍然难以区分细粒类别之间的细微差异。为了解决这一挑战,我们建议使用多频邻域(MFN)和双跨调制(DCM)提出一个新颖的几弹性细粒图像分类网络(FICNET)。 MFN专注于空间域和频域以捕获多频结构表示,从而减少了外观和背景变化对阶级距离的影响。 DCM由双肢体组件和双3D跨注意组件组成。它通过分别考虑全球上下文信息和类间关系来调节表示形式,这使得和查询样本可以响应相同的部分,并准确地确定细微的阶层差异。针对两个少量任务的三个细粒基准数据集进行了全面的实验,验证了FICNET与最先进的方法相比具有出色的性能。特别是,在两个数据集“ Caltech-UCSD鸟”和“ Stanford Cars”上进行的实验分别可以获得分类精度93.17 \%和95.36 \%。它们甚至高于一般的细粒图像分类方法可以实现的。
Traditional fine-grained image classification typically relies on large-scale training samples with annotated ground-truth. However, some sub-categories have few available samples in real-world applications, and current few-shot models still have difficulty in distinguishing subtle differences among fine-grained categories. To solve this challenge, we propose a novel few-shot fine-grained image classification network (FicNet) using multi-frequency neighborhood (MFN) and double-cross modulation (DCM). MFN focuses on both spatial domain and frequency domain to capture multi-frequency structural representations, which reduces the influence of appearance and background changes to the intra-class distance. DCM consists of bi-crisscross component and double 3D cross-attention component. It modulates the representations by considering global context information and inter-class relationship respectively, which enables the support and query samples respond to the same parts and accurately identify the subtle inter-class differences. The comprehensive experiments on three fine-grained benchmark datasets for two few-shot tasks verify that FicNet has excellent performance compared to the state-of-the-art methods. Especially, the experiments on two datasets, "Caltech-UCSD Birds" and "Stanford Cars", can obtain classification accuracy 93.17\% and 95.36\%, respectively. They are even higher than that the general fine-grained image classification methods can achieve.