推荐系统的辩论非政策评估

论文标题

Debiased Off-Policy Evaluation for Recommendation Systems

论文作者

Narita, Yusuke, Yasui, Shota, Yata, Kohei

论文摘要

评估新算法的有效方法对于改善交互式匪徒和强化学习系统（例如推荐系统）至关重要。 A/B测试是可靠的，但耗时和货币耗时，并带来失败的风险。在本文中，我们开发了一种替代方法，该方法可以预测给定历史数据可能由不同算法生成的算法的性能。我们的估计器具有其属性，其预测概率以$ \ sqrt {n} $的速率以$ \ sqrt {n} $的速率收敛到反事实算法的真实性能。我们还显示了一种估计预测方差的正确方法，从而使分析师能够量化预测中的不确定性。即使分析师不知道大量潜在的重要状态变量中哪个实际上很重要，这些属性也具有。我们通过模拟实验验证我们的方法。我们最终将其应用于一家大型广告公司的广告设计。我们发现，我们的方法比最新方法产生的均方误差较小。

Efficient methods to evaluate new algorithms are critical for improving interactive bandit and reinforcement learning systems such as recommendation systems. A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure. In this paper, we develop an alternative method, which predicts the performance of algorithms given historical data that may have been generated by a different algorithm. Our estimator has the property that its prediction converges in probability to the true performance of a counterfactual algorithm at a rate of $\sqrt{N}$, as the sample size $N$ increases. We also show a correct way to estimate the variance of our prediction, thus allowing the analyst to quantify the uncertainty in the prediction. These properties hold even when the analyst does not know which among a large number of potentially important state variables are actually important. We validate our method by a simulation experiment about reinforcement learning. We finally apply it to improve advertisement design by a major advertisement company. We find that our method produces smaller mean squared errors than state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题