Krylov-Bellman的提升：一般状态空间中的超级线性政策评估

论文标题

Krylov-Bellman的提升：一般状态空间中的超级线性政策评估

Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

论文作者

Xia, Eric, Wainwright, Martin J.

论文摘要

我们介绍并分析了Krylov-Bellman Boosting（KBB）算法，以在一般状态空间中进行政策评估。它在使用非参数回归（如提升中）拟合Bellman残留物以及通过使用最小二乘时间差（LSTD）过程估算的值函数之间交替使用，该过程与特征集随着时间的推移而适应。通过利用与Krylov方法的连接，我们为该方法配备了两种有吸引力的保证。首先，我们提供了一般的收敛结合，该结合允许在残差拟合和LSTD计算中单独估计错误。与我们的数值实验一致，该结合表明收敛速率取决于限制的光谱结构，并且通常是超线性的。其次，通过将这种元评分与样品大小相关的保证，用于剩余拟合和LSTD计算，我们获得了取决于样本量以及用于适合残差的功能类别的复杂性的具体统计保证。我们说明了KBB算法在各种政策评估问题上的行为，并且通常发现样品复杂性相对于拟合值ITERATIONN的标准方法的大幅度降低。

We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for policy evaluation in general state spaces. It alternates between fitting the Bellman residual using non-parametric regression (as in boosting), and estimating the value function via the least-squares temporal difference (LSTD) procedure applied with a feature set that grows adaptively over time. By exploiting the connection to Krylov methods, we equip this method with two attractive guarantees. First, we provide a general convergence bound that allows for separate estimation errors in residual fitting and LSTD computation. Consistent with our numerical experiments, this bound shows that convergence rates depend on the restricted spectral structure, and are typically super-linear. Second, by combining this meta-result with sample-size dependent guarantees for residual fitting and LSTD computation, we obtain concrete statistical guarantees that depend on the sample size along with the complexity of the function class used to fit the residuals. We illustrate the behavior of the KBB algorithm for various types of policy evaluation problems, and typically find large reductions in sample complexity relative to the standard approach of fitted value iterationn.

下载PDF全文

下载文献需遵守相关版权规定

论文标题