论文标题

关于有限MDP的策略梯度方法的线性收敛

On Linear Convergence of Policy Gradient Methods for Finite MDPs

论文作者

Bhandari, Jalaj, Russo, Daniel

论文摘要

我们在最简单的设置之一中对政策梯度方法进行有限的时间分析:有限状态和动作MDP,其策略类别由所有随机策略组成,并具有精确的梯度评估。最近有一些工作将这种设置视为平滑的非线性优化问题的实例,并显示出较小的台阶尺寸的亚线性收敛速率。在这里,我们根据与政策迭代的联系来采取不同的观点,并表明许多策略梯度方法的变体都以较大的步进尺寸成功,并达到了线性收敛速率。

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems and showing sub-linear convergence rates with small step-sizes. Here, we take a different perspective based on connections with policy iteration and show that many variants of policy gradient methods succeed with large step-sizes and attain a linear rate of convergence.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源