具有自动回归时间依赖性的非平稳匪徒

论文标题

具有自动回归时间依赖性的非平稳匪徒

Non-Stationary Bandits with Auto-Regressive Temporal Dependency

论文作者

Chen, Qinyi, Golrezaei, Negin, Bouneffouf, Djallel

论文摘要

传统的多臂强盗（MAB）框架，主要在随机或对抗性设置下进行检查，通常忽略许多真实世界应用程序（例如推荐系统和在线广告）中固有的时间动态。本文介绍了一个新颖的非平稳MAB框架，该框架通过自动回归（AR）奖励结构捕获了这些现实世界动态的时间结构。我们提出了一种集成了两个关键机制的算法：（i）一种熟练于利用时间依赖性以动态平衡探索和剥削的交替机制，以及（ii）旨在丢弃过时信息的重新启动机制。我们的算法达到了遗憾的上限，几乎与下界匹配，遗憾的是与强大的动态基准相符。最后，通过一项关于旅游需求预测的现实案例研究，我们既展示了算法的功效，又证明了我们技术对更复杂，快速发展的时间序列的更广泛的适用性。

Traditional multi-armed bandit (MAB) frameworks, predominantly examined under stochastic or adversarial settings, often overlook the temporal dynamics inherent in many real-world applications such as recommendation systems and online advertising. This paper introduces a novel non-stationary MAB framework that captures the temporal structure of these real-world dynamics through an auto-regressive (AR) reward structure. We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information. Our algorithm achieves a regret upper bound that nearly matches the lower bound, with regret measured against a robust dynamic benchmark. Finally, via a real-world case study on tourism demand prediction, we demonstrate both the efficacy of our algorithm and the broader applicability of our techniques to more complex, rapidly evolving time series.

下载PDF全文

下载文献需遵守相关版权规定

论文标题