基于模型的强化学习

论文标题

基于模型的强化学习

Model-Based Reinforcement Learning with SINDy

论文作者

Arora, Rushiv, da Silva, Bruno Castro, Moss, Eliot

论文摘要

我们借鉴物理界的最新进步，提出了一种新的方法，用于发现强化学习中物理系统的非线性动态（RL）。我们确定该方法能够使用较少的轨迹（仅$ \ leq 30 $时间步骤）发现基本动力学，而不是最先进的模型学习算法。此外，该技术学习了一个足够准确的模型，可以诱导近乎最佳的策略，而轨迹明显少于无模型算法所要求的轨迹。它带来了基于模型的RL的好处，而无需提前开发模型，即具有基于物理动态的系统。为了建立该算法的有效性和适用性，我们对四个经典控制任务进行实验。我们发现，对基础系统的发现动态进行培训的最佳政策可以很好地概括。此外，当部署在实际物理系统上时，学到的策略表现良好，从而将模型桥接到了实际系统差距。我们将我们的方法与基于最新的模型和无模型方法进行比较，并表明我们的方法需要在真实物理系统上比较其他方法所采样的轨迹更少。此外，我们探索了近似动力学模型，发现它们也可以表现良好。

We draw on the latest advancements in the physics community to propose a novel method for discovering the governing non-linear dynamics of physical systems in reinforcement learning (RL). We establish that this method is capable of discovering the underlying dynamics using significantly fewer trajectories (as little as one rollout with $\leq 30$ time steps) than state of the art model learning algorithms. Further, the technique learns a model that is accurate enough to induce near-optimal policies given significantly fewer trajectories than those required by model-free algorithms. It brings the benefits of model-based RL without requiring a model to be developed in advance, for systems that have physics-based dynamics. To establish the validity and applicability of this algorithm, we conduct experiments on four classic control tasks. We found that an optimal policy trained on the discovered dynamics of the underlying system can generalize well. Further, the learned policy performs well when deployed on the actual physical system, thus bridging the model to real system gap. We further compare our method to state-of-the-art model-based and model-free approaches, and show that our method requires fewer trajectories sampled on the true physical system compared other methods. Additionally, we explored approximate dynamics models and found that they also can perform well.

下载PDF全文

下载文献需遵守相关版权规定

论文标题