通过生成模型进行加固学习的顺序转移

论文标题

通过生成模型进行加固学习的顺序转移

Sequential Transfer in Reinforcement Learning with a Generative Model

论文作者

Tirinzoni, Andrea, Poiani, Riccardo, Restelli, Marcello

论文摘要

我们对如何设计强化学习代理感兴趣，这些学习剂可通过从先前解决的知识中转移知识来降低学习新任务的样本复杂性。解决相关问题的解决方案的可用性带来了基本的权衡：是否立即在新任务中寻求预期在新任务中实现高（但优越）绩效的政策，还是是否寻求信息以快速确定最佳解决方案，可能是以差的初始行为为代价。在这项工作中，当代理可以访问国家行动对的生成模型时，我们将重点关注第二个目标。首先，给定一组包含目标近似值的求解任务，我们设计了一种算法，该算法通过寻求为此目的最有用的国家行动对来迅速识别准确的解决方案。我们在其样品复杂性上得出了PAC的界限，这清楚地证明了使用这种先验知识的好处。然后，我们通过将转移设置减少到隐藏的马尔可夫模型并采用光谱方法来恢复其参数来依次地学习这些近似任务。最后，我们在简单的模拟域中凭经验验证了我们的理论发现。

We are interested in how to design reinforcement learning agents that provably reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones. The availability of solutions to related problems poses a fundamental trade-off: whether to seek policies that are expected to achieve high (yet sub-optimal) performance in the new task immediately or whether to seek information to quickly identify an optimal solution, potentially at the cost of poor initial behavior. In this work, we focus on the second objective when the agent has access to a generative model of state-action pairs. First, given a set of solved tasks containing an approximation of the target one, we design an algorithm that quickly identifies an accurate solution by seeking the state-action pairs that are most informative for this purpose. We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge. Then, we show how to learn these approximate tasks sequentially by reducing our transfer setting to a hidden Markov model and employing spectral methods to recover its parameters. Finally, we empirically verify our theoretical findings in simple simulated domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题