概率等级和奖励：板岩推荐的可扩展模型

论文标题

概率等级和奖励：板岩推荐的可扩展模型

Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation

论文作者

Aouali, Imad, Hammou, Achraf Ait Sidi, Sakhi, Otmane, Rohde, David, Vasile, Flavian

论文摘要

我们介绍了概率等级和奖励（PRR），这是一个可扩展的概率模型，用于个性化的板岩建议。我们的方法允许在用户与K项目中最多与一项相互作用的情况下对奖励的销售估算。我们表明，可以通过组合奖励，是否成功地与板岩进行交互，以及等级，以及在板岩中选择的项目，可以有效地学习板岩成功的概率。 PRR的表现优于现有的非政策奖励优化方法，并且对大型动作空间更可扩展。此外，PRR允许快速交付由最大内部产品搜索（MIPS）提供动力的建议，使其适用于诸如计算广告之类的低延迟域。

We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows off-policy estimation of the reward in the scenario where the user interacts with at most one item from a slate of K items. We show that the probability of a slate being successful can be learned efficiently by combining the reward, whether the user successfully interacted with the slate, and the rank, the item that was selected within the slate. PRR outperforms existing off-policy reward optimizing methods and is far more scalable to large action spaces. Moreover, PRR allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in low latency domains such as computational advertising.

下载PDF全文

下载文献需遵守相关版权规定

论文标题