论文标题
使用加强学习对随机系统的平均成本最佳控制
Average Cost Optimal Control of Stochastic Systems Using Reinforcement Learning
论文作者
论文摘要
本文通过增强学习解决了带有乘法和加性噪声的离散时间系统的平均成本最小化问题。通过使用Q功能,我们提出了一个在线学习方案,以估计Q功能的内核矩阵,并使用沿系统轨迹的数据更新控制增益。获得的对照增益和核矩阵已被证明会融合到最佳元素。为了实施提出的学习方案,给出了一种无模型的增强学习算法,其中使用递归最小二乘法来估计Q功能的内核矩阵。提出了一个数字示例,以说明所提出的方法。
This paper addresses the average cost minimization problem for discrete-time systems with multiplicative and additive noises via reinforcement learning. By using Q-function, we propose an online learning scheme to estimate the kernel matrix of Q-function and to update the control gain using the data along the system trajectories. The obtained control gain and kernel matrix are proved to converge to the optimal ones. To implement the proposed learning scheme, an online model-free reinforcement learning algorithm is given, where recursive least squares method is used to estimate the kernel matrix of Q-function. A numerical example is presented to illustrate the proposed approach.