统一图像搜索的深层本地和全局功能

论文标题

统一图像搜索的深层本地和全局功能

Unifying Deep Local and Global Features for Image Search

论文作者

Cao, Bingyi, Araujo, Andre, Sim, Jack

论文摘要

图像检索是搜索与查询图像相似的项目的图像数据库的问题。为了解决此任务，已经研究了两种主要类型的图像表示形式：全局和本地图像功能。在这项工作中，我们的关键贡献是将全球和本地特征统一为单个深层模型，从而可以通过有效的功能提取精确检索。我们将新模型称为Delg，代表着深层的本地和全球功能。我们利用最近的功能学习工作的课程，并提出了一个模型，该模型结合了全球功能的普遍平均池和当地特征的细心选择。可以通过仔细平衡两个头之间的梯度流来端对端学习整个网络 - 仅需要图像级标签。我们还为本地功能引入了基于自动编码器的降低技术，该技术已集成到模型中，提高了训练效率和匹配性能。综合实验表明，我们的模型在重新访问的牛津和巴黎数据集上实现了最新的图像检索，以及在Google Landmarks数据集V2上的最新单模单模型实例级识别。代码和模型可在https://github.com/tensorflow/models/tree/master/research/delf上找到。

Image retrieval is the problem of searching an image database for items that are similar to a query image. To address this task, two main types of image representations have been studied: global and local image features. In this work, our key contribution is to unify global and local features into a single deep model, enabling accurate retrieval with efficient feature extraction. We refer to the new model as DELG, standing for DEep Local and Global features. We leverage lessons from recent feature learning work and propose a model that combines generalized mean pooling for global features and attentive selection for local features. The entire network can be learned end-to-end by carefully balancing the gradient flow between two heads -- requiring only image-level labels. We also introduce an autoencoder-based dimensionality reduction technique for local features, which is integrated into the model, improving training efficiency and matching performance. Comprehensive experiments show that our model achieves state-of-the-art image retrieval on the Revisited Oxford and Paris datasets, and state-of-the-art single-model instance-level recognition on the Google Landmarks dataset v2. Code and models are available at https://github.com/tensorflow/models/tree/master/research/delf .

下载PDF全文

下载文献需遵守相关版权规定

论文标题