论文标题

主席:电子商务产品搜索的类似图像检索

SIR: Similar Image Retrieval for Product Search in E-Commerce

论文作者

Stanley, Theban, Vanjara, Nihar, Pan, Yanxin, Pirogova, Ekaterina, Chakraborty, Swagata, Chaudhuri, Abon

论文摘要

我们提出了一个类似的图像检索(SIR)平台,该平台用于在数百万的目录中快速发现视觉上相似的产品。鉴于我们目录的规模,多样性和活力,产品搜索带来了许多挑战。可以通过构建监督模型来对其进行解决,以用代表主题的标签标记产品图像,然后通过标签检索它们。这种方法足以容纳常见主题和多年生主题,例如“白衬衫”或“电视的生活方式图像”。它不适用于“电子烟”,例如“带有促销徽章的图像”或相关性较短的诸如“万圣节服装”之类的新主题。 SIR是这种情况的理想选择,因为它允许我们以示例搜索,而不是预定义的主题。我们描述了嵌入计算,编码和索引的步骤 - 为大约最近的邻居搜索后端供电。我们还强调了先生的两个应用。第一个与检测具有各种可能令人反感主题的产品有关。该应用程序以紧迫感运行,因此不允许训练和引导模型的典型时间范围。同样,这些主题通常是根据当前趋势而短暂的,因此花费资源来建立持久模型是没有道理的。第二个应用程序是一个变体项目检测系统,SIR帮助发现很难通过文本搜索找到的视觉变体。我们在这些应用程序的背景下分析了SIR的性能。

We present a similar image retrieval (SIR) platform that is used to quickly discover visually similar products in a catalog of millions. Given the size, diversity, and dynamism of our catalog, product search poses many challenges. It can be addressed by building supervised models to tagging product images with labels representing themes and later retrieving them by labels. This approach suffices for common and perennial themes like "white shirt" or "lifestyle image of TV". It does not work for new themes such as "e-cigarettes", hard-to-define ones such as "image with a promotional badge", or the ones with short relevance span such as "Halloween costumes". SIR is ideal for such cases because it allows us to search by an example, not a pre-defined theme. We describe the steps - embedding computation, encoding, and indexing - that power the approximate nearest neighbor search back-end. We also highlight two applications of SIR. The first one is related to the detection of products with various types of potentially objectionable themes. This application is run with a sense of urgency, hence the typical time frame to train and bootstrap a model is not permitted. Also, these themes are often short-lived based on current trends, hence spending resources to build a lasting model is not justified. The second application is a variant item detection system where SIR helps discover visual variants that are hard to find through text search. We analyze the performance of SIR in the context of these applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源