深贝叶斯土匪：在在线个性化建议中探索

论文标题

深贝叶斯土匪：在在线个性化建议中探索

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

论文作者

Guo, Dalin, Ktena, Sofia Ira, Huszar, Ferenc, Myana, Pranay Kumar, Shi, Wenzhe, Tejani, Alykhan

论文摘要

以连续学习方式培训的推荐系统受到反馈回路问题的困扰，也称为算法偏见。这导致新训练的模型贪婪地采取行动，并喜欢用户已经参与的项目。这种行为在个性化的ADS建议中尤其有害，因为这也可能导致新运动未开发。探索旨在通过提供有关环境的新信息来解决此限制，该信息涵盖了用户偏好，并可以带来更高的长期奖励。在这项工作中，我们将展示广告推荐人作为上下文匪徒制定，并实施探索技术，这些技术需要以可计算上的方式从点击率的后验分布进行采样。默认情况下，传统的大规模深度学习模型不能提供不确定性估计。我们通过使用具有多个头部和辍学单元的自举模型来近似预测的这些不确定性测量。我们使用用户ADS参与的公开数据集在离线仿真环境中基准了许多不同的模型。我们在离线模拟和在线AB设置中测试了我们提议的深贝叶斯土匪算法，并具有大规模生产流量，我们在其中证明了探索模型的积极增长。

Recommender systems trained in a continuous learning fashion are plagued by the feedback loop problem, also known as algorithmic bias. This causes a newly trained model to act greedily and favor items that have already been engaged by users. This behavior is particularly harmful in personalised ads recommendations, as it can also cause new campaigns to remain unexplored. Exploration aims to address this limitation by providing new information about the environment, which encompasses user preference, and can lead to higher long-term reward. In this work, we formulate a display advertising recommender as a contextual bandit and implement exploration techniques that require sampling from the posterior distribution of click-through-rates in a computationally tractable manner. Traditional large-scale deep learning models do not provide uncertainty estimates by default. We approximate these uncertainty measurements of the predictions by employing a bootstrapped model with multiple heads and dropout units. We benchmark a number of different models in an offline simulation environment using a publicly available dataset of user-ads engagements. We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting with large-scale production traffic, where we demonstrate a positive gain of our exploration model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题