关于在观察性扰动下安全加固学习的鲁棒性

论文标题

关于在观察性扰动下安全加固学习的鲁棒性

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

论文作者

Liu, Zuxin, Guo, Zijian, Cen, Zhepeng, Zhang, Huan, Tan, Jie, Li, Bo, Zhao, Ding

论文摘要

安全加强学习（RL）训练一项政策，以最大程度地提高任务奖励，同时满足安全性限制。虽然先前的工作重点是性能最佳性，但我们发现许多安全RL问题的最佳解决方案对精心设计的观察性扰动并不强大且安全。我们正式分析了在安全RL设置中设计有效的观察性对抗攻击者的独特属性。我们表明，用于标准RL任务的基线对抗攻击技术并不总是有效的，对于安全RL，并提出了两种新方法 - 一种使成本最大化，另一个可以最大程度地提高奖励。一个有趣且违反直觉的发现是，最大的奖励攻击是强大的，因为它既可以诱导不安全的行为，又可以通过保持奖励来使攻击隐秘。我们进一步提出了一个可靠的培训框架，以确保安全RL，并通过全面的实验进行评估。本文提供了一项开拓者的工作，以调查RL在观察攻击下的安全性和鲁棒性，以实现未来的安全RL研究。代码可在：\ url {https://github.com/liuzuxin/safe-rl-robustness}

Safe reinforcement learning (RL) trains a policy to maximize the task reward while satisfying safety constraints. While prior works focus on the performance optimality, we find that the optimal solutions of many safe RL problems are not robust and safe against carefully designed observational perturbations. We formally analyze the unique properties of designing effective observational adversarial attackers in the safe RL setting. We show that baseline adversarial attack techniques for standard RL tasks are not always effective for safe RL and propose two new approaches - one maximizes the cost and the other maximizes the reward. One interesting and counter-intuitive finding is that the maximum reward attack is strong, as it can both induce unsafe behaviors and make the attack stealthy by maintaining the reward. We further propose a robust training framework for safe RL and evaluate it via comprehensive experiments. This paper provides a pioneer work to investigate the safety and robustness of RL under observational attacks for future safe RL studies. Code is available at: \url{https://github.com/liuzuxin/safe-rl-robustness}

下载PDF全文

下载文献需遵守相关版权规定

论文标题