学会嵌入联合图像文本检索的语义相似性

论文标题

学会嵌入联合图像文本检索的语义相似性

Learning to embed semantic similarity for joint image-text retrieval

论文作者

Malali, Noam, Keller, Yosi

论文摘要

我们提出了一种深入学习方法，用于学习欧几里得空间中图像和字幕的联合语义嵌入，从而使语义相似性通过嵌入空间中的L2距离近似。为此，我们介绍了一种指标学习方案，该计划利用多任务学习来学习使用中心损失的相同语义概念的嵌入。通过将可区分的量化方案引入端到端可训练网络，我们得出了欧几里得空间中语义相似概念的语义嵌入。我们还建议使用自适应边缘铰链损失进行新颖的度量学习公式，该公式在训练阶段进行了完善。提出的方案应用于MS-Coco，Flicke30K和FlickR8K数据集，并被证明与当代最先进的方法相比。

We present a deep learning approach for learning the joint semantic embeddings of images and captions in a Euclidean space, such that the semantic similarity is approximated by the L2 distances in the embedding space. For that, we introduce a metric learning scheme that utilizes multitask learning to learn the embedding of identical semantic concepts using a center loss. By introducing a differentiable quantization scheme into the end-to-end trainable network, we derive a semantic embedding of semantically similar concepts in Euclidean space. We also propose a novel metric learning formulation using an adaptive margin hinge loss, that is refined during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favorably with contemporary state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题