使用Elite Buffer进化的深度加强学习：在连续控制任务中与EA结合使用的新型DRL方法

论文标题

使用Elite Buffer进化的深度加强学习：在连续控制任务中与EA结合使用的新型DRL方法

Evolutionary Deep Reinforcement Learning Using Elite Buffer: A Novel Approach Towards DRL Combined with EA in Continuous Control Tasks

论文作者

Esmaeeli, Marzieh Sadat, Malek, Hamed

论文摘要

尽管在许多控制任务中进行了大量的应用和深度强化学习的成功，但它仍然存在许多关键问题和局限性，包括稀疏奖励的时间信用分配，缺乏有效的探索以及对问题的超参数极为敏感的脆弱融合。在持续控制中进行深度强化学习的问题以及进化算法在面对其中一些问题中的成功，已经出现了进化增强学习的想法，这引起了许多争议。尽管在该领域的一些研究中取得了成功的结果，但针对这些问题及其局限性的适当解决方案尚未提出。本研究旨在研究进一步加强强化学习和进化计算的两个领域的效率，并朝着改善方法和现有挑战迈出一步。 “使用精英缓冲液的进化深度强化学习”算法通过互动学习能力和人脑中的假设结果的灵感引入了一种新的机制。在这种方法中，精英缓冲液的利用（这是受到人类思想中经验的学习的启发），以及跨界和突变操作员的存在，以及连续一代的交互式学习，在连续控制领域的效率，融合，融合，融合的效率和正确进步。根据实验的结果，所提出的方法在具有高复杂性和维度的环境中超过了其他众所周知的方法，并且在解决上述问题和局限性方面表现出色。

Despite the numerous applications and success of deep reinforcement learning in many control tasks, it still suffers from many crucial problems and limitations, including temporal credit assignment with sparse reward, absence of effective exploration, and a brittle convergence that is extremely sensitive to the hyperparameters of the problem. The problems of deep reinforcement learning in continuous control, along with the success of evolutionary algorithms in facing some of these problems, have emerged the idea of evolutionary reinforcement learning, which attracted many controversies. Despite successful results in a few studies in this field, a proper and fitting solution to these problems and their limitations is yet to be presented. The present study aims to study the efficiency of combining the two fields of deep reinforcement learning and evolutionary computations further and take a step towards improving methods and the existing challenges. The "Evolutionary Deep Reinforcement Learning Using Elite Buffer" algorithm introduced a novel mechanism through inspiration from interactive learning capability and hypothetical outcomes in the human brain. In this method, the utilization of the elite buffer (which is inspired by learning based on experience generalization in the human mind), along with the existence of crossover and mutation operators, and interactive learning in successive generations, have improved efficiency, convergence, and proper advancement in the field of continuous control. According to the results of experiments, the proposed method surpasses other well-known methods in environments with high complexity and dimension and is superior in resolving the mentioned problems and limitations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题