无模型的神经Lyapunov控制机器人导航

论文标题

无模型的神经Lyapunov控制机器人导航

Model-free Neural Lyapunov Control for Safe Robot Navigation

论文作者

Xiong, Zikang, Eappen, Joe, Qureshi, Ahmed H., Jagannathan, Suresh

论文摘要

无模型的深钢筋学习（DRL）控制器在各种具有挑战性的非线性控制任务上表现出了令人鼓舞的结果。尽管无模型的DRL算法可以解决未知的动态和高维问题，但它缺乏安全性保证。尽管可以作为奖励功能的一部分编码安全限制，但是在经过改良的奖励和安全控制器训练的RL控制器之间仍然存在很大的差距。相比之下，我们没有与DRL训练环中的控制策略明确地共同编码奖励的安全约束，而是与DRL培训循环中的控制策略合并了双神经lyapunov功能（TNLF），并使用学识渊博的TNLF来构建运行时监视器。与计划器生成的路径相结合，监视器选择了适当的路点，以指导学习控制器提供无冲突的控制轨迹。我们的方法继承了DRL的可伸缩性优势，同时增强了安全保证。我们的实验评估证明了与DRL相比，在一系列高维安全敏感的导航任务中，我们的方法与DRL相比具有增强的奖励和限制的DRL方法。

Model-free Deep Reinforcement Learning (DRL) controllers have demonstrated promising results on various challenging non-linear control tasks. While a model-free DRL algorithm can solve unknown dynamics and high-dimensional problems, it lacks safety assurance. Although safety constraints can be encoded as part of a reward function, there still exists a large gap between an RL controller trained with this modified reward and a safe controller. In contrast, instead of implicitly encoding safety constraints with rewards, we explicitly co-learn a Twin Neural Lyapunov Function (TNLF) with the control policy in the DRL training loop and use the learned TNLF to build a runtime monitor. Combined with the path generated from a planner, the monitor chooses appropriate waypoints that guide the learned controller to provide collision-free control trajectories. Our approach inherits the scalability advantages from DRL while enhancing safety guarantees. Our experimental evaluation demonstrates the effectiveness of our approach compared to DRL with augmented rewards and constrained DRL methods over a range of high-dimensional safety-sensitive navigation tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题