论文标题
随机线性匪徒对对抗性攻击
Stochastic Linear Bandits Robust to Adversarial Attacks
论文作者
论文摘要
我们考虑了一个随机线性匪徒问题,其中奖励不仅会受到随机噪声的影响,而且还受到适当的预算$ c $(即,在整个时间范围内的腐败大幅面)的对抗性攻击(即,上限)。我们提供了两个强大的分阶段消除算法的变体,一种知道$ c $,另一种不知道。在未腐败的情况$ C = 0 $中,这两种变体均表现出近乎最佳的遗憾,而总体上对$ C $具有线性和二次依赖性。我们提出算法独立的下限,表明这些加性项几乎是最佳的。此外,在上下文环境中,我们重新审视了各种不同上下文的设置,并表明一种简单的贪婪算法证明,尽管不进行明确的探索,并且不知道$ c $,却在近乎最理想的添加性遗憾术语中表现出色。
We consider a stochastic linear bandit problem in which the rewards are not only subject to random noise, but also adversarial attacks subject to a suitable budget $C$ (i.e., an upper bound on the sum of corruption magnitudes across the time horizon). We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. Both variants are shown to attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively having a linear and quadratic dependency on $C$ in general. We present algorithm independent lower bounds showing that these additive terms are near-optimal. In addition, in a contextual setting, we revisit a setup of diverse contexts, and show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.