论文标题
二阶后悔对局部匪徒反馈下的普遍专家序列的范围界限
Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback
论文作者
论文摘要
我们研究了部分匪徒反馈设置下的专家建议问题,并创建一种顺序的最小值最佳算法。我们的算法与更一般的部分监测设置一起工作,与经典的强盗反馈相比,可以以对抗性方式揭示损失。我们的算法采用了普遍的预测观点,其性能对一般专家选择顺序进行了遗憾。我们研究的遗憾是针对涵盖许多设置(例如切换或上下文专家设置)和竞争类中的专家选择序列的一般竞争课程。我们的遗憾界限是平方损失的总和,在损失序列的任意仿射变换下,我们算法的归一化遗憾是不变的。我们的算法是真正的在线算法,并且不使用有关损失序列的任何初步信息。
We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.