通过课程指导的贝叶斯强化学习的ROI受限竞标

论文标题

通过课程指导的贝叶斯强化学习的ROI受限竞标

ROI-Constrained Bidding via Curriculum-Guided Bayesian Reinforcement Learning

论文作者

Wang, Haozhe, Du, Chao, Fang, Panyan, Yuan, Shuo, He, Xuming, Wang, Liang, Zheng, Bo

论文摘要

实时投标（RTB）是现代在线广告系统中的重要机制。广告商在RTB中采用竞标策略来优化其广告效果，但要遵守各种财务要求，尤其是投资回报（ROI）约束。在顺序招标过程中，ROI在非单调的情况下变化，并且通常在约束满意度和客观优化之间产生透明效应。尽管某些现有方法在静态或轻微变化的广告市场中显示出令人鼓舞的结果，但由于它们在非平稳性和部分可观察性中无法适应平衡的约束和目标，因此它们无法推广到具有ROI限制的高度动态广告市场。在这项工作中，我们专门从事非机构市场的ROI约束招标。基于部分可观察到的马尔可夫决策过程，我们的方法利用了一个指标的奖励功能，没有额外的权衡参数，并开发了课程指导的贝叶斯加固学习框架（CBRL）框架，以适应非常规广告市场中的约束 - 目标折衷。在具有两个问题设置的大规模工业数据集上进行的广泛实验表明，CBRL在分布和分布数据方面都很好地概括了，并且具有卓越的学习效率和稳定性。

Real-Time Bidding (RTB) is an important mechanism in modern online advertising systems. Advertisers employ bidding strategies in RTB to optimize their advertising effects subject to various financial requirements, especially the return-on-investment (ROI) constraint. ROIs change non-monotonically during the sequential bidding process, and often induce a see-saw effect between constraint satisfaction and objective optimization. While some existing approaches show promising results in static or mildly changing ad markets, they fail to generalize to highly dynamic ad markets with ROI constraints, due to their inability to adaptively balance constraints and objectives amidst non-stationarity and partial observability. In this work, we specialize in ROI-Constrained Bidding in non-stationary markets. Based on a Partially Observable Constrained Markov Decision Process, our method exploits an indicator-augmented reward function free of extra trade-off parameters and develops a Curriculum-Guided Bayesian Reinforcement Learning (CBRL) framework to adaptively control the constraint-objective trade-off in non-stationary ad markets. Extensive experiments on a large-scale industrial dataset with two problem settings reveal that CBRL generalizes well in both in-distribution and out-of-distribution data regimes, and enjoys superior learning efficiency and stability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题