Langevin DQN

论文标题

论文作者

Dwaracherla, Vikranth, Van Roy, Benjamin

论文摘要

解决深度探索的算法（在增强学习中的重要挑战）通过合奏或其他超模，勘探奖金或访问数量分布来依靠认知不确定性表示。一个开放的问题是，是否可以通过跟踪单点估计值的增量增强学习算法来实现深度探索，而无需考虑认知不确定性所需的额外复杂性。我们肯定地回答了这个问题。特别是，我们开发了Langevin DQN，这是DQN的变体，仅在与高斯噪声的扰动参数更新中有所不同，并通过计算研究证明了所提出的算法可以实现深入的探索。我们还为Langevin DQN如何实现深入探索提供了一些直觉。此外，我们提出了Langevin DQN算法的修改，以提高计算效率。

Algorithms that tackle deep exploration -- an important challenge in reinforcement learning -- have relied on epistemic uncertainty representation through ensembles or other hypermodels, exploration bonuses, or visitation count distributions. An open question is whether deep exploration can be achieved by an incremental reinforcement learning algorithm that tracks a single point estimate, without additional complexity required to account for epistemic uncertainty. We answer this question in the affirmative. In particular, we develop Langevin DQN, a variation of DQN that differs only in perturbing parameter updates with Gaussian noise and demonstrate through a computational study that the presented algorithm achieves deep exploration. We also offer some intuition to how Langevin DQN achieves deep exploration. In addition, we present a modification of the Langevin DQN algorithm to improve the computational efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题

Langevin DQN

Langevin DQN

论文作者

论文摘要

加入微信交流群