数量与质量：用于深入增强学习的超参数优化

论文标题

数量与质量：用于深入增强学习的超参数优化

Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning

论文作者

Hertel, Lars, Baldi, Pierre, Gillen, Daniel L.

论文摘要

增强学习算法可以显示出不同随机种子的训练运行之间的性能差异。在本文中，我们探讨了当目标是找到跨随机种子的高参数设置时，这如何影响超参数优化。特别是，我们基准是否更好地通过修剪性能者的修剪来探索大量的超参数设置，还是更好地通过使用重复来实现收集结果的质量。为此，我们考虑连续的减半，随机搜索和贝叶斯优化算法，后两个带有和没有重复的情况。我们将其应用于Cartpole平衡任务和倒置的摆动任务上的PPO2算法。我们证明，修剪可能会对优化产生负面影响，并且重复采样无助于找到在随机种子中表现更好的超参数设置。从我们的实验中，我们得出的结论是，具有噪声强大采集功能的贝叶斯优化是在增强学习任务中优化超参数优化的最佳选择。

Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds. In this paper we explore how this affects hyperparameter optimization when the goal is to find hyperparameter settings that perform well across random seeds. In particular, we benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions. For this we consider the Successive Halving, Random Search, and Bayesian Optimization algorithms, the latter two with and without repetitions. We apply these to tuning the PPO2 algorithm on the Cartpole balancing task and the Inverted Pendulum Swing-up task. We demonstrate that pruning may negatively affect the optimization and that repeated sampling does not help in finding hyperparameter settings that perform better across random seeds. From our experiments we conclude that Bayesian optimization with a noise robust acquisition function is the best choice for hyperparameter optimization in reinforcement learning tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题