使用期望最大化的多变量动力系统的随机最佳控制

论文标题

使用期望最大化的多变量动力系统的随机最佳控制

Stochastic Optimal Control for Multivariable Dynamical Systems Using Expectation Maximization

论文作者

Mallick, Prakash, Chen, Zhiyong

论文摘要

轨迹优化是一个基本的随机最佳控制问题。本文介绍了具有测量噪声的动力学系统的轨迹优化方法，该方法可以安装在线性变化的随机模型中。这些控制问题的精确/完整解决方案在文献中被认为是在分析上棘手的，因为它们属于可观察到的马尔可夫决策过程（POMDP）的类别。因此，广泛寻求具有合理近似值的有效解决方案。我们建议在加强学习环境中对随机控制的重新重新制定。这种类型的配方吸收了常规最佳控制程序的好处，并具有最大似然方法的优势。最后，称为随机最佳控制 - 期望最大化（SOC-EM）的迭代轨迹优化范式是put-forth。这种轨迹优化程序在减少累积成本成本方面表现出更好的表现，从理论和经验上证明这一点。此外，我们还提供了与控制参数估计的唯一性有关的新型理论工作。对控制协方差矩阵的分析进行了分析，该矩阵通过有效平衡探索和剥削来处理随机性。

Trajectory optimization is a fundamental stochastic optimal control problem. This paper deals with a trajectory optimization approach for dynamical systems subject to measurement noise that can be fitted into linear time-varying stochastic models. Exact/complete solutions to these kind of control problems have been deemed analytically intractable in literature because they come under the category of Partially Observable Markov Decision Processes (POMDPs). Therefore, effective solutions with reasonable approximations are widely sought for. We propose a reformulation of stochastic control in a reinforcement learning setting. This type of formulation assimilates the benefits of conventional optimal control procedure, with the advantages of maximum likelihood approaches. Finally, an iterative trajectory optimization paradigm called as Stochastic Optimal Control - Expectation Maximization (SOC-EM) is put-forth. This trajectory optimization procedure exhibits better performance in terms of reduction of cumulative cost-to-go which is proved both theoretically and empirically. Furthermore, we also provide novel theoretical work which is related to uniqueness of control parameter estimates. Analysis of the control covariance matrix is presented, which handles stochasticity through efficiently balancing exploration and exploitation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题