论文标题
影响图匪:结构化匪徒问题的变异汤普森抽样
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
论文作者
论文摘要
我们为结构化土匪提出了一个新颖的框架,我们称之为影响图匪。我们的框架捕获了动作,潜在变量和观察之间的复杂统计依赖性;因此,统一并扩展了许多现有的模型,例如组合半伴侣,级联匪徒和低级匪徒。我们开发了新颖的在线学习算法,这些算法学会在模型中有效地行事。关键思想是准确或大约跟踪模型参数的结构化后验分布。为了采取行动,我们从其后部采样模型参数,然后使用影响图的结构在采样参数下找到最乐观的作用。我们在三个结构化的匪徒问题中凭经验评估了我们的算法,并表明它们的性能与特定于问题的最先进的基线相比,它们的性能和更好或更好。
We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines.