论文标题
直接策略搜索状态反馈$ \ MATHCAL {h} _ \ infty $ robust Control:与Goldstein Subdifferential的非平滑合成的重新访问
Global Convergence of Direct Policy Search for State-Feedback $\mathcal{H}_\infty$ Robust Control: A Revisit of Nonsmooth Synthesis with Goldstein Subdifferential
论文作者
论文摘要
直接的政策搜索已被广泛应用于现代强化学习和持续控制中。但是,尚未完全理解在非平滑鲁棒控制合成上的直接策略搜索的理论属性。最佳$ \ MATHCAL {H} _ \ INFTY $控制框架旨在设计一项政策,以最大程度地减少闭环$ \ Mathcal {H} _ \ Infty $ Norm,并且可以说是最基本的强大控制范围。在这项工作中,我们表明,可以保证直接策略搜索找到可靠的$ \ Mathcal {h} _ \ intcal $ state $ state-feedback控制设计问题的全局解决方案。请注意,策略搜索最佳$ \ MATHCAL {H} _ \ infty $ CONTROR会导致受约束的非convex非conmooth优化问题,其中非convex可行集合由所有稳定封闭环动力学的策略组成。我们表明,对于这个非平滑优化问题,所有Clarke固定点都是全球最小值。接下来,我们确定闭环$ \ mathcal {h} _ \ infty $目标函数的强制性,并证明所得策略搜索问题的所有级别集都紧凑。 Based on these properties, we show that Goldstein's subgradient method and its implementable variants can be guaranteed to stay in the nonconvex feasible set and eventually find the global optimal solution of the $\mathcal{H}_\infty$ state-feedback synthesis problem.我们的工作建立了非convex非平滑优化理论和稳健控制之间的新连接,从而导致了有趣的全局收敛结果,用于直接策略搜索在最佳$ \ mathcal {h} _ \ infty $综合。
Direct policy search has been widely applied in modern reinforcement learning and continuous control. However, the theoretical properties of direct policy search on nonsmooth robust control synthesis have not been fully understood. The optimal $\mathcal{H}_\infty$ control framework aims at designing a policy to minimize the closed-loop $\mathcal{H}_\infty$ norm, and is arguably the most fundamental robust control paradigm. In this work, we show that direct policy search is guaranteed to find the global solution of the robust $\mathcal{H}_\infty$ state-feedback control design problem. Notice that policy search for optimal $\mathcal{H}_\infty$ control leads to a constrained nonconvex nonsmooth optimization problem, where the nonconvex feasible set consists of all the policies stabilizing the closed-loop dynamics. We show that for this nonsmooth optimization problem, all Clarke stationary points are global minimum. Next, we identify the coerciveness of the closed-loop $\mathcal{H}_\infty$ objective function, and prove that all the sublevel sets of the resultant policy search problem are compact. Based on these properties, we show that Goldstein's subgradient method and its implementable variants can be guaranteed to stay in the nonconvex feasible set and eventually find the global optimal solution of the $\mathcal{H}_\infty$ state-feedback synthesis problem. Our work builds a new connection between nonconvex nonsmooth optimization theory and robust control, leading to an interesting global convergence result for direct policy search on optimal $\mathcal{H}_\infty$ synthesis.