论文标题

适应$ k $武装的土匪的范围

Adaptation to the Range in $K$-Armed Bandits

论文作者

Hadiji, Hédi, Stoltz, Gilles

论文摘要

我们考虑使用$ K $臂的随机匪徒问题,每一个都与$ [m,m] $范围内支持的有限分布相关。我们不认为已知$ [m,m] $范围,并表明学习此范围的成本。确实,出现了与分销相关和无分配遗憾之间的新权衡,从而防止同时实现典型的$ \ ln t $和$ \ sqrt {t} $ bunds。例如,仅当与分布依赖的后悔界限至少在$ \ sqrt {t} $的顺序中,只有在$ \ sqrt {t} $}的无分布遗憾界限才能实现。我们展示了一项策略,以实现新的权衡表明的遗憾。

We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m,M]$. We do not assume that the range $[m,M]$ is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical $\ln T$ and $\sqrt{T}$ bounds. For instance, a $\sqrt{T}$}distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order $\sqrt{T}$. We exhibit a strategy achieving the rates for regret indicated by the new trade-off.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源