论文标题
Pac-Bayesian终身学习多军匪徒
PAC-Bayesian Lifelong Learning For Multi-Armed Bandits
论文作者
论文摘要
我们提出了对终身学习的Pac-Bayesian分析。在终身学习问题中,观察到一系列学习任务的顺序,目标是将获取的信息从以前的任务转移到新的学习任务。当每个学习任务是多军匪徒问题时,我们会考虑这种情况。如果给定的多武器强盗算法在新任务中运行,并以特定的先验和一定的步骤进行,我们将获得预期平均奖励的下限。我们建议将我们的新界限用作学习目标的终身学习算法。我们提出的算法在几个终生的多臂匪徒问题中进行了评估,并发现其性能要比不使用泛化边界的基线方法更好。
We present a PAC-Bayesian analysis of lifelong learning. In the lifelong learning problem, a sequence of learning tasks is observed one-at-a-time, and the goal is to transfer information acquired from previous tasks to new learning tasks. We consider the case when each learning task is a multi-armed bandit problem. We derive lower bounds on the expected average reward that would be obtained if a given multi-armed bandit algorithm was run in a new task with a particular prior and for a set number of steps. We propose lifelong learning algorithms that use our new bounds as learning objectives. Our proposed algorithms are evaluated in several lifelong multi-armed bandit problems and are found to perform better than a baseline method that does not use generalisation bounds.