论文标题
关于RL概括的预训练的力量:可证明的好处和硬度
On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness
论文作者
论文摘要
加强学习(RL)的概括旨在在培训期间学习一个对目标环境的培训。本文从理论方面研究了RL的概括:我们期望在培训环境中进行预训练多少?当不允许与目标环境的互动时,我们证明我们能获得的最好的是平均意义上的近乎最佳政策,并且我们设计了一种实现这一目标的算法。此外,当允许代理与目标环境相互作用时,我们给出一个令人惊讶的结果,表明渐近地,训练的改善最多是恒定的因素。另一方面,在非反应状态下,我们设计了一种有效的算法,并证明了在目标环境中绑定的基于分布的遗憾,而与国家行动空间无关。
Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment. This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful? When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal. Furthermore, when the agent is allowed to interact with the target environment, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, in the non-asymptotic regime, we design an efficient algorithm and prove a distribution-based regret bound in the target environment that is independent of the state-action space.