关于有限MDP的策略梯度方法的线性收敛

论文标题

关于有限MDP的策略梯度方法的线性收敛

On Linear Convergence of Policy Gradient Methods for Finite MDPs

论文作者

Bhandari, Jalaj, Russo, Daniel

论文摘要

我们在最简单的设置之一中对政策梯度方法进行有限的时间分析：有限状态和动作MDP，其策略类别由所有随机策略组成，并具有精确的梯度评估。最近有一些工作将这种设置视为平滑的非线性优化问题的实例，并显示出较小的台阶尺寸的亚线性收敛速率。在这里，我们根据与政策迭代的联系来采取不同的观点，并表明许多策略梯度方法的变体都以较大的步进尺寸成功，并达到了线性收敛速率。

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems and showing sub-linear convergence rates with small step-sizes. Here, we take a different perspective based on connections with policy iteration and show that many variants of policy gradient methods succeed with large step-sizes and attain a linear rate of convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题