汤普森抽样的渐近收敛性

论文标题

汤普森抽样的渐近收敛性

Asymptotic Convergence of Thompson Sampling

论文作者

Kalkanli, Cem, Ozgur, Ayfer

论文摘要

汤普森采样已被证明是各种在线学习任务中的有效政策。许多作品已经分析了汤普森采样的有限时间表现，并证明了它在广泛的概率环境下产生了次线性的遗憾。但是，其渐近行为仍然大部分没有被逐出。在本文中，我们证明了汤普森采样的渐近收敛性在假设下贝叶斯遗憾的情况下，并表明汤普森采样剂的作用提供了最佳作用的强烈一致的估计量。我们的结果依赖于汤普森采样固有的the虫结构。

Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. Many works have analyzed the finite time performance of Thompson sampling, and proved that it achieves a sub-linear regret under a broad range of probabilistic settings. However its asymptotic behavior remains mostly underexplored. In this paper, we prove an asymptotic convergence result for Thompson sampling under the assumption of a sub-linear Bayesian regret, and show that the actions of a Thompson sampling agent provide a strongly consistent estimator of the optimal action. Our results rely on the martingale structure inherent in Thompson sampling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题