增强学习基于学习的产品输送频率控制

论文标题

增强学习基于学习的产品输送频率控制

Reinforcement Learning-based Product Delivery Frequency Control

论文作者

Liu, Yang, Chen, Zhengxing, Virochsiri, Kittipat, Wang, Juan, Wu, Jiahao, Liang, Feng

论文摘要

频率控制是现代推荐系统中的一个重要问题。它决定了推荐的交付频率，以维持产品质量和效率。例如，传递促销通知的频率会影响每日指标以及基础设施资源消耗（例如CPU和内存使用情况）。关于我们应该优化哪些目标以从长期最好地表示业务价值，以及在动态波动的环境中如何平衡我们应在日常指标和资源消耗之间取得平衡的问题。我们为频率控制问题提出了个性化方法，该方法将使用加强学习（RL）与我们称为“有效因子”的强大音量控制技术结合了长期价值优化。我们证明，在数十亿用户的几个通知应用程序中，我们的方法在每日指标和资源效率方面都具有统计学上的显着改善。据我们所知，我们的研究代表了这种工业规模上对频率控制问题的第一个深入RL应用。

Frequency control is an important problem in modern recommender systems. It dictates the delivery frequency of recommendations to maintain product quality and efficiency. For example, the frequency of delivering promotional notifications impacts daily metrics as well as the infrastructure resource consumption (e.g. CPU and memory usage). There remain open questions on what objective we should optimize to represent business values in the long term best, and how we should balance between daily metrics and resource consumption in a dynamically fluctuating environment. We propose a personalized methodology for the frequency control problem, which combines long-term value optimization using reinforcement learning (RL) with a robust volume control technique we termed "Effective Factor". We demonstrate statistically significant improvement in daily metrics and resource efficiency by our method in several notification applications at a scale of billions of users. To our best knowledge, our study represents the first deep RL application on the frequency control problem at such an industrial scale.

下载PDF全文

下载文献需遵守相关版权规定

论文标题