论文标题

有关线性部分监视的指示采样的信息

Information Directed Sampling for Linear Partial Monitoring

论文作者

Kirschner, Johannes, Lattimore, Tor, Krause, Andreas

论文摘要

部分监测是在不确定性下进行顺序决策的丰富框架,它概括了许多知名的匪徒模型,包括线性,组合和决斗匪。我们介绍了针对随机部分监测的定向采样(IDS),并具有线性奖励和观察结构。 IDS达到了自适应最差的遗憾率,取决于游戏的精确可观察条件。此外,我们证明了将所有有限游戏的最小值遗憾分为四个可能的政权的下限。在所有情况下,IDS都达到对数因素的最佳速度,而无需调整任何超参数。我们将结果进一步扩展到上下文和内核设置,从而大大增加了可能的应用程序的范围。

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源