论文标题
在接近图上的强化路由以有效建议
Reinforcement Routing on Proximity Graph for Efficient Recommendation
论文作者
论文摘要
我们专注于最大的内部产品搜索(MIPS),这在许多机器学习社区中都是必不可少的问题。给定查询,MIPS找到了最大内部产品的最相似项目。通常在度量空间上定义的最近的邻居搜索方法(NNS)不会表现出MIPS问题的令人满意的性能,因为内部产品是一种非计量函数。然而,与度量功能相比,内部产品具有许多良好的特性,例如避免消失和爆炸梯度。结果,内部产品在许多推荐系统中广泛使用,这使得有效的最大内部产品搜索成为加速许多推荐系统的钥匙。 基于图形的NNS问题方法显示了与其他类方法相比的优势。数据库的每个数据点映射到接近图的节点。数据库中最近的邻居搜索可以转换为接近图上的路由,以找到查询的最近邻居。该技术可用于解决MIPS问题。我们没有搜索最近的邻居查询,而是在接近图上使用查询的最大内部产品搜索项目。在本文中,如果我们缺乏训练查询的基础真相,我们提出了一个加固模型来培训代理以自动搜索MIPS问题的近距图。如果我们知道某些培训查询的基础真理,我们的模型也可以通过模仿学习来提高代理商的搜索能力来利用这些基础真理。通过实验,我们可以看到我们提出的模式结合了增强学习与模仿学习的模式显示了优于最新方法的优势
We focus on Maximum Inner Product Search (MIPS), which is an essential problem in many machine learning communities. Given a query, MIPS finds the most similar items with the maximum inner products. Methods for Nearest Neighbor Search (NNS) which is usually defined on metric space don't exhibit the satisfactory performance for MIPS problem since inner product is a non-metric function. However, inner products exhibit many good properties compared with metric functions, such as avoiding vanishing and exploding gradients. As a result, inner product is widely used in many recommendation systems, which makes efficient Maximum Inner Product Search a key for speeding up many recommendation systems. Graph based methods for NNS problem show the superiorities compared with other class methods. Each data point of the database is mapped to a node of the proximity graph. Nearest neighbor search in the database can be converted to route on the proximity graph to find the nearest neighbor for the query. This technique can be used to solve MIPS problem. Instead of searching the nearest neighbor for the query, we search the item with maximum inner product with query on the proximity graph. In this paper, we propose a reinforcement model to train an agent to search on the proximity graph automatically for MIPS problem if we lack the ground truths of training queries. If we know the ground truths of some training queries, our model can also utilize these ground truths by imitation learning to improve the agent's search ability. By experiments, we can see that our proposed mode which combines reinforcement learning with imitation learning shows the superiorities over the state-of-the-art methods