论文标题
具有自动回归时间依赖性的非平稳匪徒
Non-Stationary Bandits with Auto-Regressive Temporal Dependency
论文作者
论文摘要
传统的多臂强盗(MAB)框架,主要在随机或对抗性设置下进行检查,通常忽略许多真实世界应用程序(例如推荐系统和在线广告)中固有的时间动态。本文介绍了一个新颖的非平稳MAB框架,该框架通过自动回归(AR)奖励结构捕获了这些现实世界动态的时间结构。我们提出了一种集成了两个关键机制的算法:(i)一种熟练于利用时间依赖性以动态平衡探索和剥削的交替机制,以及(ii)旨在丢弃过时信息的重新启动机制。我们的算法达到了遗憾的上限,几乎与下界匹配,遗憾的是与强大的动态基准相符。最后,通过一项关于旅游需求预测的现实案例研究,我们既展示了算法的功效,又证明了我们技术对更复杂,快速发展的时间序列的更广泛的适用性。
Traditional multi-armed bandit (MAB) frameworks, predominantly examined under stochastic or adversarial settings, often overlook the temporal dynamics inherent in many real-world applications such as recommendation systems and online advertising. This paper introduces a novel non-stationary MAB framework that captures the temporal structure of these real-world dynamics through an auto-regressive (AR) reward structure. We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information. Our algorithm achieves a regret upper bound that nearly matches the lower bound, with regret measured against a robust dynamic benchmark. Finally, via a real-world case study on tourism demand prediction, we demonstrate both the efficacy of our algorithm and the broader applicability of our techniques to more complex, rapidly evolving time series.