适应$ k $武装的土匪的范围

论文标题

适应$ k $武装的土匪的范围

Adaptation to the Range in $K$-Armed Bandits

论文作者

Hadiji, Hédi, Stoltz, Gilles

论文摘要

我们考虑使用$ K $臂的随机匪徒问题，每一个都与$ [m，m] $范围内支持的有限分布相关。我们不认为已知$ [m，m] $范围，并表明学习此范围的成本。确实，出现了与分销相关和无分配遗憾之间的新权衡，从而防止同时实现典型的$ \ ln t $和$ \ sqrt {t} $ bunds。例如，仅当与分布依赖的后悔界限至少在$ \ sqrt {t} $的顺序中，只有在$ \ sqrt {t} $}的无分布遗憾界限才能实现。我们展示了一项策略，以实现新的权衡表明的遗憾。

We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m,M]$. We do not assume that the range $[m,M]$ is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical $\ln T$ and $\sqrt{T}$ bounds. For instance, a $\sqrt{T}$}distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order $\sqrt{T}$. We exhibit a strategy achieving the rates for regret indicated by the new trade-off.

下载PDF全文

下载文献需遵守相关版权规定

论文标题