具有生成模型的KL-凝集调查的RL是最佳的

论文标题

具有生成模型的KL-凝集调查的RL是最佳的

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

论文作者

Kozuno, Tadashi, Yang, Wenhao, Vieillard, Nino, Kitamura, Toshinori, Tang, Yunhao, Mei, Jincheng, Ménard, Pierre, Azar, Mohammad Gheshlaghi, Valko, Michal, Munos, Rémi, Pietquin, Olivier, Geist, Matthieu, Szepesvári, Csaba

论文摘要

在这项工作中，我们使用生成模型来考虑和分析无模型增强学习的样本复杂性。特别是，我们分析了Geist等人的镜像下降值迭代（MDVI）。（2019）和Vieillard等。（2020a），它在其价值和策略更新中使用Kullback-Leibler差异和熵正则化。我们的分析表明，当$ \ varepsilon $足够小时，找到$ \ varepsilon $ - 最佳政策几乎是最小的。这是第一个理论上的结果表明，在考虑到的设置下，一种简单的无模型算法几乎可以最小值 - 最佳选择。

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for finding an $\varepsilon$-optimal policy when $\varepsilon$ is sufficiently small. This is the first theoretical result that demonstrates that a simple model-free algorithm without variance-reduction can be nearly minimax-optimal under the considered setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题