在汤普森（Thompson

论文标题

在汤普森（Thompson

On Thompson Sampling for Smoother-than-Lipschitz Bandits

论文作者

Grant, James A., Leslie, David S.

论文摘要

汤普森采样是针对强盗和增强学习问题的良好方法。但是，它在连续武装匪徒问题中的使用量相对较少。我们提供了汤普森对连续武装匪徒在弱条件下的遗憾的遗憾，其中包含真实函数和亚指数观察噪声。通过分析Eluder维度，我们的边界是对函数类别复杂性的最近提出的衡量标准来实现的，该量子类别的复杂性衡量了，这已被证明可用于限制汤普森采样的贝叶斯遗憾，从而在高斯次观察噪声下对较简单的匪徒问题进行了更简单的匪徒问题。我们为具有Lipschitz衍生物的函数类别的Eluder维度提供了一个新的绑定，并在多个方面概括了先前的分析。

Thompson Sampling is a well established approach to bandit and reinforcement learning problems. However its use in continuum armed bandit problems has received relatively little attention. We provide the first bounds on the regret of Thompson Sampling for continuum armed bandits under weak conditions on the function class containing the true function and sub-exponential observation noise. Our bounds are realised by analysis of the eluder dimension, a recently proposed measure of the complexity of a function class, which has been demonstrated to be useful in bounding the Bayesian regret of Thompson Sampling for simpler bandit problems under sub-Gaussian observation noise. We derive a new bound on the eluder dimension for classes of functions with Lipschitz derivatives, and generalise previous analyses in multiple regards.

下载PDF全文

下载文献需遵守相关版权规定

论文标题