论文标题
使用$ \ ell_p $置信集的离线线性上下文匪徒的悲观情绪
Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets
论文作者
论文摘要
我们介绍一个家庭$ \ {\hatπ\} _ {p \ ge 1} $悲观的学习规则,用于离线学习线性上下文匪徒,依靠相对于不同的$ \ ell_p $规范的信心集,其中$ \hatπ_2$与Bellman Consisist of Bellman Consisist of tounderizist $ hat is $ hat-bcpp ftty相对应(LCB)到线性设置。我们表明,从某种意义上说,新颖的$ \hatπ_\ infty $学习规则是自适应的最佳选择,因为它可以针对所有$ \ ell_q $限制的问题实现最小值性能(最多到日志因素),因此严格统治着家庭中的所有其他预测因子,包括$ \hatπ_2$。
We present a family $\{\hatπ\}_{p\ge 1}$ of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different $\ell_p$ norms, where $\hatπ_2$ corresponds to Bellman-consistent pessimism (BCP), while $\hatπ_\infty$ is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel $\hatπ_\infty$ learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all $\ell_q$-constrained problems, and as such it strictly dominates all other predictors in the family, including $\hatπ_2$.