通过非参数值近似从像素中快速和数据有效的增强学习

论文标题

通过非参数值近似从像素中快速和数据有效的增强学习

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

论文作者

Long, Alexander, Blair, Alan, van Hoof, Herke

论文摘要

我们提出了轨道间回报（NAIT）的非参数近似，这是一种用于离散作用的增强学习算法，基于像素的环境，既是样本又高效率。 Nait是一种懒惰学习的方法，其更新等同于情节完成时的情节蒙特卡洛，但这允许在剧集进行时稳定地融入奖励。我们利用固定的域 - 不可吻合表示，基于距离的简单探索和基于图形的查找，以促进非常快速的执行。我们在Atari100k的26和57游戏变体上进行了经验评估Nait，尽管它很简单，但它在在线环境中取得了竞争性能，并且在墙时的速度超过100倍。

We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题