Pac-Bayesian终身学习多军匪徒

论文标题

Pac-Bayesian终身学习多军匪徒

PAC-Bayesian Lifelong Learning For Multi-Armed Bandits

论文作者

Flynn, Hamish, Reeb, David, Kandemir, Melih, Peters, Jan

论文摘要

我们提出了对终身学习的Pac-Bayesian分析。在终身学习问题中，观察到一系列学习任务的顺序，目标是将获取的信息从以前的任务转移到新的学习任务。当每个学习任务是多军匪徒问题时，我们会考虑这种情况。如果给定的多武器强盗算法在新任务中运行，并以特定的先验和一定的步骤进行，我们将获得预期平均奖励的下限。我们建议将我们的新界限用作学习目标的终身学习算法。我们提出的算法在几个终生的多臂匪徒问题中进行了评估，并发现其性能要比不使用泛化边界的基线方法更好。

We present a PAC-Bayesian analysis of lifelong learning. In the lifelong learning problem, a sequence of learning tasks is observed one-at-a-time, and the goal is to transfer information acquired from previous tasks to new learning tasks. We consider the case when each learning task is a multi-armed bandit problem. We derive lower bounds on the expected average reward that would be obtained if a given multi-armed bandit algorithm was run in a new task with a particular prior and for a set number of steps. We propose lifelong learning algorithms that use our new bounds as learning objectives. Our proposed algorithms are evaluated in several lifelong multi-armed bandit problems and are found to perform better than a baseline method that does not use generalisation bounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题