论文标题

一种基于置信的新颖算法的结构匪徒

A Novel Confidence-Based Algorithm for Structured Bandits

论文作者

Tirinzoni, Andrea, Lazaric, Alessandro, Restelli, Marcello

论文摘要

我们研究有限臂的随机匪徒,其中每个手臂的奖励可能与其他手臂的奖励相关。我们介绍了一种新颖的分阶段算法,该算法利用给定的结构来建立对真实匪徒问题参数的置信度,并迅速丢弃所有亚最佳手臂。特别是,与没有结构的标准匪徒算法不同,我们表明,由于拉动其他手臂收集的信息,实际上可以减少次优臂的次数。此外,我们表明,在某些结构中,随着时间的流逝,我们算法的任何时间延伸的遗憾是统一的。对于这些恒定纤维结构,我们还得出了匹配的下限。最后,我们从数值上证明我们的方法比现有方法更好地利用了某些结构。

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem and rapidly discard all sub-optimal arms. In particular, unlike standard bandit algorithms with no structure, we show that the number of times a suboptimal arm is selected may actually be reduced thanks to the information collected by pulling other arms. Furthermore, we show that, in some structures, the regret of an anytime extension of our algorithm is uniformly bounded over time. For these constant-regret structures, we also derive a matching lower bound. Finally, we demonstrate numerically that our approach better exploits certain structures than existing methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源