论文标题

使用$ \ ell_p $置信集的离线线性上下文匪徒的悲观情绪

Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets

论文作者

Li, Gene, Ma, Cong, Srebro, Nathan

论文摘要

我们介绍一个家庭$ \ {\hatπ\} _ {p \ ge 1} $悲观的学习规则,用于离线学习线性上下文匪徒,依靠相对于不同的$ \ ell_p $规范的信心集,其中$ \hatπ_2$与Bellman Consisist of Bellman Consisist of tounderizist $ hat is $ hat-bcpp ftty相对应(LCB)到线性设置。我们表明,从某种意义上说,新颖的$ \hatπ_\ infty $学习规则是自适应的最佳选择,因为它可以针对所有$ \ ell_q $限制的问题实现最小值性能(最多到日志因素),因此严格统治着家庭中的所有其他预测因子,包括$ \hatπ_2$。

We present a family $\{\hatπ\}_{p\ge 1}$ of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different $\ell_p$ norms, where $\hatπ_2$ corresponds to Bellman-consistent pessimism (BCP), while $\hatπ_\infty$ is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel $\hatπ_\infty$ learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all $\ell_q$-constrained problems, and as such it strictly dominates all other predictors in the family, including $\hatπ_2$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源