使用MPC和高斯流程学习的基于模型的增强性学习安全性

论文标题

使用MPC和高斯流程学习的基于模型的增强性学习安全性

Learning safety in model-based Reinforcement Learning using MPC and Gaussian Processes

论文作者

Airaldi, Filippo, De Schutter, Bart, Dabiri, Azita

论文摘要

我们提出了一种通过高斯工艺（GP）回归中基于模型预测控制（MPC）基于模型预测控制（MPC）的增强学习（RL）的方法。该框架由1）一个参数MPC方案组成，该方案被用作基于模型的控制器，对真实系统的动态具有近似知识，2）一种情节的RL算法，其任务是调整MPC参数化的责任，以提高其性能，最后，3）GP回归器是否可以直接估算出对MPC参数的概率，以估算某些概率，是否可以对MPC参数进行预测。制定安全或不安全的政策。然后将这些约束强制执行到RL更新中，以通过概率安全机制来增强学习方法。与将安全RL与MPC结合的其他最新出版物相比，我们的方法不需要对预测模型的进一步假设来保留计算障碍。我们在数值示例中说明了我们方法的结果，该示例涉及在安全至关重要的环境中控制四个无人机的结果。

We propose a method to encourage safety in Model Predictive Control (MPC)-based Reinforcement Learning (RL) via Gaussian Process (GP) regression. This framework consists of 1) a parametric MPC scheme that is employed as model-based controller with approximate knowledge on the real system's dynamics, 2) an episodic RL algorithm tasked with adjusting the MPC parametrization in order to increase its performance, and lastly, 3) GP regressors used to estimate, directly from data, constraints on the MPC parameters capable of predicting, up to some probability, whether the parametrization is likely to yield a safe or unsafe policy. These constraints are then enforced onto the RL updates in an effort to enhance the learning method with a probabilistic safety mechanism. Compared to other recent publications combining safe RL with MPC, our method does not require further assumptions on, e.g., the prediction model in order to retain computational tractability. We illustrate the results of our method in a numerical example on the control of a quadrotor drone in a safety-critical environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题