具有基于价值的知识合并的记忆有效的增强学习

论文标题

具有基于价值的知识合并的记忆有效的增强学习

Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation

论文作者

Lan, Qingfeng, Pan, Yangchen, Luo, Jun, Mahmood, A. Rupam

论文摘要

人工神经网络对于一般功能近似是有希望的，但由于灾难性的遗忘，训练非独立或非相同分布数据的挑战。体验重播缓冲液是深度加固学习中的标准组件，通常用于通过将经验存储在大型缓冲区中，并将其用于以后培训，以降低遗忘和提高样本效率。但是，大型的重播缓冲区会导致沉重的记忆负担，尤其是对于有限的内存能力的机载和边缘设备。我们建议基于深层Q网络算法的记忆有效的增强学习算法，以减轻此问题。我们的算法通过将知识从目标Q网络巩固到当前的Q网络来降低遗忘和保持高样本效率。与基线方法相比，我们的算法在基于功能的任务和基于图像的任务中实现了可比或更好的性能，同时减轻了大型经验重播缓冲区的负担。

Artificial neural networks are promising for general function approximation but challenging to train on non-independent or non-identically distributed data due to catastrophic forgetting. The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later. However, a large replay buffer results in a heavy memory burden, especially for onboard and edge devices with limited memory capacities. We propose memory-efficient reinforcement learning algorithms based on the deep Q-network algorithm to alleviate this problem. Our algorithms reduce forgetting and maintain high sample efficiency by consolidating knowledge from the target Q-network to the current Q-network. Compared to baseline methods, our algorithms achieve comparable or better performance in both feature-based and image-based tasks while easing the burden of large experience replay buffers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题