PLA：离线加固学习的潜在动作空间

论文标题

PLA：离线加固学习的潜在动作空间

PLAS: Latent Action Space for Offline Reinforcement Learning

论文作者

Zhou, Wenxuan, Bajracharya, Sujay, Held, David

论文摘要

离线增强学习的目的是从固定数据集中学习政策，而无需与环境进行进一步的互动。对于强化学习（例如Robotics）的现实世界应用，这种设置将越来越重要，其中数据收集缓慢且潜在的危险。由于分布外动作的外推错误，现有的非政策算法在静态数据集上的性能有限。这导致了限制策略在培训期间支持数据集中选择动作的挑战。我们建议简单地学习潜在行动空间（PLA）中的政策，以便自然地满足此要求。我们通过物理机器人在模拟和可变形的对象操纵任务中评估我们的方法。我们证明，我们的方法在各种连续的控制任务和不同类型的数据集中始终如一地提供竞争性能，从而超过了具有明确约束的现有离线强化学习方法。视频和代码可在https://sites.google.com/view/latent-policy上找到。

The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment. This setting will be an increasingly more important paradigm for real-world applications of reinforcement learning such as robotics, in which data collection is slow and potentially dangerous. Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions. This leads to the challenge of constraining the policy to select actions within the support of the dataset during training. We propose to simply learn the Policy in the Latent Action Space (PLAS) such that this requirement is naturally satisfied. We evaluate our method on continuous control benchmarks in simulation and a deformable object manipulation task with a physical robot. We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets, outperforming existing offline reinforcement learning methods with explicit constraints. Videos and code are available at https://sites.google.com/view/latent-policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题