检索长尾视觉识别的增强分类

论文标题

检索长尾视觉识别的增强分类

Retrieval Augmented Classification for Long-Tail Visual Recognition

论文作者

Long, Alexander, Yin, Wei, Ajanthan, Thalaiyasingam, Nguyen, Vu, Purkait, Pulak, Garg, Ravi, Blair, Alan, Shen, Chunhua, Hengel, Anton van den

论文摘要

我们引入了检索增强分类（RAC），这是一种使用显式检索模块增强标准图像分类管道的通用方法。 RAC由标准基本图像编码器与并行检索分支融合，该分支查询了预编码图像和相关文本片段的非参数外部内存。我们将RAC应用于长尾分类的问题，尽管仅将培训数据集本身作为外部信息来源，但在Place365-LT和Inturalist-2018上的先前最新情况（分别为14.5％和6.7％）表现出了显着改善。我们证明了RAC的检索模块在没有提示的情况下，就学会了尾巴课程的高度准确性。反过来，这又释放了基本编码器以专注于普通类并提高其表现。 RAC代表了使用大型，审慎的模型而无需进行微调的替代方法，也是在常见的计算机视觉体系结构中更有效地利用外部内存的第一步。

We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module. RAC consists of a standard base image encoder fused with a parallel retrieval branch that queries a non-parametric external memory of pre-encoded images and associated text snippets. We apply RAC to the problem of long-tail classification and demonstrate a significant improvement over previous state-of-the-art on Places365-LT and iNaturalist-2018 (14.5% and 6.7% respectively), despite using only the training datasets themselves as the external information source. We demonstrate that RAC's retrieval module, without prompting, learns a high level of accuracy on tail classes. This, in turn, frees the base encoder to focus on common classes, and improve its performance thereon. RAC represents an alternative approach to utilizing large, pretrained models without requiring fine-tuning, as well as a first step towards more effectively making use of external memory within common computer vision architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题