通过基于RL特异性GAN的对应功能，深入增强学习中的知识转移

论文标题

通过基于RL特异性GAN的对应功能，深入增强学习中的知识转移

Knowledge Transfer in Deep Reinforcement Learning via an RL-Specific GAN-Based Correspondence Function

论文作者

Ruman, Marko, Guy, Tatiana V.

论文摘要

深度强化学习在复杂的决策任务中表现出了超人类的表现，但是它在概括和知识再利用中挣扎 - 真正智能的关键方面。本文介绍了一种新颖的方法，该方法修改了专门用于增强学习的循环生成的对抗网络，从而使两个任务之间有效的一对一知识转移。我们的方法通过两个新组件增强了损失函数：模型损失，该模型损失捕获了源和目标任务之间的动态关系，而Q-loss则标识了状态严重影响目标决策政策。在2-D Atari Game Pong上进行了测试，我们的方法在相同的任务中获得了100％的知识转移，并且根据网络体系结构，旋转任务的培训时间减少了30％或减少了30％的培训时间。相比之下，使用标准生成的对抗网络或周期生成的对抗网络，在大多数情况下，与从头开始的训练相比，性能要差。结果表明，所提出的方法确保了深度强化学习的增强知识概括。

Deep reinforcement learning has demonstrated superhuman performance in complex decision-making tasks, but it struggles with generalization and knowledge reuse - key aspects of true intelligence. This article introduces a novel approach that modifies Cycle Generative Adversarial Networks specifically for reinforcement learning, enabling effective one-to-one knowledge transfer between two tasks. Our method enhances the loss function with two new components: model loss, which captures dynamic relationships between source and target tasks, and Q-loss, which identifies states significantly influencing the target decision policy. Tested on the 2-D Atari game Pong, our method achieved 100% knowledge transfer in identical tasks and either 100% knowledge transfer or a 30% reduction in training time for a rotated task, depending on the network architecture. In contrast, using standard Generative Adversarial Networks or Cycle Generative Adversarial Networks led to worse performance than training from scratch in the majority of cases. The results demonstrate that the proposed method ensured enhanced knowledge generalization in deep reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题