关于RL概括的预训练的力量：可证明的好处和硬度

论文标题

关于RL概括的预训练的力量：可证明的好处和硬度

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness

论文作者

Ye, Haotian, Chen, Xiaoyu, Wang, Liwei, Du, Simon S.

论文摘要

加强学习（RL）的概括旨在在培训期间学习一个对目标环境的培训。本文从理论方面研究了RL的概括：我们期望在培训环境中进行预训练多少？当不允许与目标环境的互动时，我们证明我们能获得的最好的是平均意义上的近乎最佳政策，并且我们设计了一种实现这一目标的算法。此外，当允许代理与目标环境相互作用时，我们给出一个令人惊讶的结果，表明渐近地，训练的改善最多是恒定的因素。另一方面，在非反应状态下，我们设计了一种有效的算法，并证明了在目标环境中绑定的基于分布的遗憾，而与国家行动空间无关。

Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment. This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful? When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal. Furthermore, when the agent is allowed to interact with the target environment, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, in the non-asymptotic regime, we design an efficient algorithm and prove a distribution-based regret bound in the target environment that is independent of the state-action space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题