论文标题
具有生成模型的KL-凝集调查的RL是最佳的
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
论文作者
论文摘要
在这项工作中,我们使用生成模型来考虑和分析无模型增强学习的样本复杂性。特别是,我们分析了Geist等人的镜像下降值迭代(MDVI)。 (2019)和Vieillard等。 (2020a),它在其价值和策略更新中使用Kullback-Leibler差异和熵正则化。我们的分析表明,当$ \ varepsilon $足够小时,找到$ \ varepsilon $ - 最佳政策几乎是最小的。这是第一个理论上的结果表明,在考虑到的设置下,一种简单的无模型算法几乎可以最小值 - 最佳选择。
In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for finding an $\varepsilon$-optimal policy when $\varepsilon$ is sufficiently small. This is the first theoretical result that demonstrates that a simple model-free algorithm without variance-reduction can be nearly minimax-optimal under the considered setting.