论文标题
使用图形神经网络处理稀疏奖励
Dealing with Sparse Rewards Using Graph Neural Networks
论文作者
论文摘要
在可观察到的环境中,深入的强化学习本身就是一项艰巨的任务,并且稀疏的奖励信号可能会更加复杂。在三维环境中涉及导航的大多数任务为代理提供了极为有限的信息。通常,代理会从环境中收到视觉观察输入,并在情节结束时获得一次奖励。良好的奖励功能可以大大改善此类任务的增强学习算法的融合。提高奖励信号密度的经典方法是通过补充奖励增强它。该技术称为奖励成型。在这项研究中,我们提出了基于图形卷积网络的最新奖励塑形方法之一:第一个涉及高级聚集函数的方法,以及第二种利用注意机制的方法。我们从经验上验证了解决方案在3D环境中具有稀疏奖励的行驶任务的有效性。对于具有注意机制的解决方案,我们还能够证明学习的注意力集中在与3D环境中重要过渡相对应的边缘上。
Deep reinforcement learning in partially observable environments is a difficult task in itself, and can be further complicated by a sparse reward signal. Most tasks involving navigation in three-dimensional environments provide the agent with extremely limited information. Typically, the agent receives a visual observation input from the environment and is rewarded once at the end of the episode. A good reward function could substantially improve the convergence of reinforcement learning algorithms for such tasks. The classic approach to increase the density of the reward signal is to augment it with supplementary rewards. This technique is called the reward shaping. In this study, we propose two modifications of one of the recent reward shaping methods based on graph convolutional networks: the first involving advanced aggregation functions, and the second utilizing the attention mechanism. We empirically validate the effectiveness of our solutions for the task of navigation in a 3D environment with sparse rewards. For the solution featuring attention mechanism, we are also able to show that the learned attention is concentrated on edges corresponding to important transitions in 3D environment.