论文标题
正规化Q学习
Regularized Q-learning
论文作者
论文摘要
Q学习是在增强学习社区中广泛使用的算法。在查找表设置下,它的收敛性已很好。但是,已知其行为在线性函数近似情况下是不稳定的。本文开发了一种新的Q学习算法,该算法在使用线性函数近似时会收敛。我们证明,简单地添加适当的正则化项可确保算法的收敛性。我们使用基于开关系统模型的最新分析工具证明了它的稳定性。此外,我们在实验上表明,它在具有线性函数近似的Q学习环境中收敛。我们还提供了在算法收敛的解决方案上绑定的误差。
Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This paper develops a new Q-learning algorithm that converges when linear function approximation is used. We prove that simply adding an appropriate regularization term ensures convergence of the algorithm. We prove its stability using a recent analysis tool based on switching system models. Moreover, we experimentally show that it converges in environments where Q-learning with linear function approximation has known to diverge. We also provide an error bound on the solution where the algorithm converges.