表征深Q学习中的动作临时差距

论文标题

表征深Q学习中的动作临时差距

Characterizing the Action-Generalization Gap in Deep Q-Learning

论文作者

Zhou, Zhiyuan, Allen, Cameron, Asadi, Kavosh, Konidaris, George

论文摘要

我们研究离散作用空间中深Q学习的动作概括能力。概括对于有效的加强学习至关重要，因为它允许代理使用从过去的新任务经验中学到的知识。但是，尽管函数近似为深度RL的代理提供了一种自然的概括方式，但相同的概括机制不适用于离散的动作输出。然而，令人惊讶的是，我们的实验表明，使用此类函数近似器的深Q-Networks（DQN）仍然能够实现适度的动作概括。我们的主要贡献是双重的：首先，我们提出了一种使用行动相似性专家知识评估行动概括的方法，并从经验上确认行动概括会导致学习速度更快。其次，我们表征了不同领域中的动作将军差距（DQN与专家之间的学习绩效差异）。我们发现，DQN确实可以概括多个简单域中的动作，但是随着动作空间的增长，其这样做的能力会降低。

We study the action generalization ability of deep Q-learning in discrete action spaces. Generalization is crucial for efficient reinforcement learning (RL) because it allows agents to use knowledge learned from past experiences on new tasks. But while function approximation provides deep RL agents with a natural way to generalize over state inputs, the same generalization mechanism does not apply to discrete action outputs. And yet, surprisingly, our experiments indicate that Deep Q-Networks (DQN), which use exactly this type of function approximator, are still able to achieve modest action generalization. Our main contribution is twofold: first, we propose a method of evaluating action generalization using expert knowledge of action similarity, and empirically confirm that action generalization leads to faster learning; second, we characterize the action-generalization gap (the difference in learning performance between DQN and the expert) in different domains. We find that DQN can indeed generalize over actions in several simple domains, but that its ability to do so decreases as the action space grows larger.

下载PDF全文

下载文献需遵守相关版权规定

论文标题