迈向学习控制政策的政策优化的理论基础

论文标题

迈向学习控制政策的政策优化的理论基础

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

论文作者

Hu, Bin, Zhang, Kaiqing, Li, Na, Mesbahi, Mehran, Fazel, Maryam, Başar, Tamer

论文摘要

基于梯度的方法已被广泛用于系统设计和优化。最近，在控制和强化学习的背景下研究这些方法的理论特性已经引起了人们的兴趣。本文对政策优化的一些最新发展进行了调查，这是一种基于梯度的迭代方法，用于反馈控制综合，并通过强化学习的成功普及。我们在博览会中采用跨学科的观点，该观点将控制理论，强化学习和大规模优化联系起来。我们回顾了许多最近开发的有关优化领域，全球收敛性和基于梯度的复杂性的理论结果，这些方法用于各种连续控制问题，例如线性二次调节器（LQR），$ \ MATHCAL {h} _ \ hH} _ \ INF iffty $ infty $ contration，风险控制，线性质量级别的综合（LQG）和Fouffer（LQ）。结合这些优化结果，我们还讨论了直接策略优化如何处理基于学习的控制中的稳定性和鲁棒性关注，这是控制工程中的两个主要Desiderata。我们通过指出学习和控制交集的几个挑战和机遇来结束调查。

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR), $\mathcal{H}_\infty$ control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.

下载PDF全文

下载文献需遵守相关版权规定

论文标题