使用可靠的跨凝结法的约束基于模型的增强学习

论文标题

使用可靠的跨凝结法的约束基于模型的增强学习

Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method

论文作者

Liu, Zuxin, Zhou, Hongyi, Chen, Baiming, Zhong, Sicheng, Hebert, Martial, Zhao, Ding

论文摘要

本文研究了受限制违规的指示信号的受限/安全加强学习（RL）问题。我们提出了一种基于模型的方法，以使RL代理能够有效地探索未知的系统动力学和环境限制的环境，并且鉴于违规预算的数量很少。我们采用神经网络集成模型来估计预测不确定性，并使用模型预测控制作为基本控制框架。我们提出了强大的跨凝结方法，以考虑模型的不确定性和约束来优化控制顺序。我们在安全健身房环境中评估我们的方法。结果表明，我们的方法学会了与最先进的基线相比，违反约束数量少得多的任务。此外，与受限的无模型RL方法相比，我们能够达到几个数量级的样品效率。该代码可在\ url {https://github.com/liuzuxin/safe-mbrl}中获得。

This paper studies the constrained/safe reinforcement learning (RL) problem with sparse indicator signals for constraint violations. We propose a model-based approach to enable RL agents to effectively explore the environment with unknown system dynamics and environment constraints given a significantly small number of violation budgets. We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework. We propose the robust cross-entropy method to optimize the control sequence considering the model uncertainty and constraints. We evaluate our methods in the Safety Gym environment. The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines. Additionally, we are able to achieve several orders of magnitude better sample efficiency when compared with constrained model-free RL approaches. The code is available at \url{https://github.com/liuzuxin/safe-mbrl}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题