论文标题
汤普森抽样的渐近收敛性
Asymptotic Convergence of Thompson Sampling
论文作者
论文摘要
汤普森采样已被证明是各种在线学习任务中的有效政策。许多作品已经分析了汤普森采样的有限时间表现,并证明了它在广泛的概率环境下产生了次线性的遗憾。但是,其渐近行为仍然大部分没有被逐出。在本文中,我们证明了汤普森采样的渐近收敛性在假设下贝叶斯遗憾的情况下,并表明汤普森采样剂的作用提供了最佳作用的强烈一致的估计量。我们的结果依赖于汤普森采样固有的the虫结构。
Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. Many works have analyzed the finite time performance of Thompson sampling, and proved that it achieves a sub-linear regret under a broad range of probabilistic settings. However its asymptotic behavior remains mostly underexplored. In this paper, we prove an asymptotic convergence result for Thompson sampling under the assumption of a sub-linear Bayesian regret, and show that the actions of a Thompson sampling agent provide a strongly consistent estimator of the optimal action. Our results rely on the martingale structure inherent in Thompson sampling.