可信赖的强化学习反对内在脆弱性：鲁棒性，安全性和概括性

论文标题

可信赖的强化学习反对内在脆弱性：鲁棒性，安全性和概括性

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

论文作者

Xu, Mengdi, Liu, Zuxin, Huang, Peide, Ding, Wenhao, Cen, Zhepeng, Li, Bo, Zhao, Ding

论文摘要

值得信赖的强化学习算法应有能力解决具有挑战性的现实世界问题，包括{Robustly}处理不确定性，满足{安全}的约束以避免灾难性的失败，以及在部署过程中未看到的情况。这项研究旨在概述考虑其在鲁棒，安全性和概括性方面的内在脆弱性的信任强化学习的这些主要观点。特别是，我们给出严格的配方，对相应的方法进行分类，并讨论每个观点的基准。此外，我们提供了一个展望部分，以刺激有希望的未来方向，并简要讨论考虑人类反馈的外部漏洞。我们希望这项调查可以在统一的框架中将单独的研究线程融合在一起，并促进强化学习的可信赖性。

A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including {robustly} handling uncertainties, satisfying {safety} constraints to avoid catastrophic failures, and {generalizing} to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题