对不确定域中神经进化的PGA-MAP元素的经验分析

论文标题

对不确定域中神经进化的PGA-MAP元素的经验分析

Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains

论文作者

Flageat, Manon, Chalumeau, Felix, Cully, Antoine

论文摘要

质量多样性算法（其中MAP-ELITE）已成为仅绩效优化方法的有力替代方法，因为它们能够生成各种和高性能解决方案的收集，以解决优化问题。但是，它们通常仅限于低维搜索空间和确定性环境。最近引入的策略梯度辅助MAP-ELITE（PGA-MAP-ELITE）算法通过将MAP-ELITE的传统遗传操作员与受深度强化学习启发的基于梯度的操作员配对，从而克服了这一限制。该新操作员使用策略范围指导突变实现高性能解决方案。在这项工作中，我们提出了一项对PGA-MAP-Elites的深入研究。我们证明了政策梯度对算法性能的好处以及在考虑不确定域时生成的解决方案的可重复性。我们首先证明，在确定性和不确定的高维环境中，PGA-MAP-Elites具有很高的性能，从而将其探讨的两个挑战推断了。其次，我们表明，除了胜过所有考虑的基线外，在不确定的环境中，PGA-Map-eLites产生的解决方案的集合还可以高度重现，以接近专门用于不确定应用的质量多样性方法发现的解决方案的可重复性。最后，我们建议对基于策略梯度的变化的动态进行消融和深入分析。我们证明，策略级变化运算符是保证PGA-MAP-ELITES性能的决定因素，但仅在过程的早期阶段至关重要，在此过程的早期阶段，它发现了搜索空间的高性能区域。

Quality-Diversity algorithms, among which MAP-Elites, have emerged as powerful alternatives to performance-only optimisation approaches as they enable generating collections of diverse and high-performing solutions to an optimisation problem. However, they are often limited to low-dimensional search spaces and deterministic environments. The recently introduced Policy Gradient Assisted MAP-Elites (PGA-MAP-Elites) algorithm overcomes this limitation by pairing the traditional Genetic operator of MAP-Elites with a gradient-based operator inspired by Deep Reinforcement Learning. This new operator guides mutations toward high-performing solutions using policy-gradients. In this work, we propose an in-depth study of PGA-MAP-Elites. We demonstrate the benefits of policy-gradients on the performance of the algorithm and the reproducibility of the generated solutions when considering uncertain domains. We first prove that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments, decorrelating the two challenges it tackles. Secondly, we show that in addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments, approaching the reproducibility of solutions found by Quality-Diversity approaches built specifically for uncertain applications. Finally, we propose an ablation and in-depth analysis of the dynamic of the policy-gradients-based variation. We demonstrate that the policy-gradient variation operator is determinant to guarantee the performance of PGA-MAP-Elites but is only essential during the early stage of the process, where it finds high-performing regions of the search space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题