使用$ \ ell_p $置信集的离线线性上下文匪徒的悲观情绪

论文标题

使用$ \ ell_p $置信集的离线线性上下文匪徒的悲观情绪

Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets

论文作者

Li, Gene, Ma, Cong, Srebro, Nathan

论文摘要

我们介绍一个家庭$ \ {\hatπ\} _ {p \ ge 1} $悲观的学习规则，用于离线学习线性上下文匪徒，依靠相对于不同的$ \ ell_p $规范的信心集，其中$ \hatπ_2$与Bellman Consisist of Bellman Consisist of tounderizist $ hat is $ hat-bcpp ftty相对应（LCB）到线性设置。我们表明，从某种意义上说，新颖的$ \hatπ_\ infty $学习规则是自适应的最佳选择，因为它可以针对所有$ \ ell_q $限制的问题实现最小值性能（最多到日志因素），因此严格统治着家庭中的所有其他预测因子，包括$ \hatπ_2$。

We present a family $\{\hatπ\}_{p\ge 1}$ of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different $\ell_p$ norms, where $\hatπ_2$ corresponds to Bellman-consistent pessimism (BCP), while $\hatπ_\infty$ is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel $\hatπ_\infty$ learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all $\ell_q$-constrained problems, and as such it strictly dominates all other predictors in the family, including $\hatπ_2$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题