论文标题

通过拥挤的山谷下降 - 基准测试深度学习优化器

Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers

论文作者

Schmidt, Robin M., Schneider, Frank, Hennig, Philipp

论文摘要

选择优化器被认为是深度学习中最关键的设计决策之一,这并非易事。现在,文献逐渐列出了数百种优化方法。在没有明确的理论指导和结论性的经验证据的情况下,通常是基于轶事做出的决定。在这项工作中,我们旨在取代这些轶事,即使不是结论性的排名,至少是有证据支持的启发式方法。为此,我们执行了十五个特别流行的深度学习优化者的广泛,标准化的基准,同时简要概述了广泛的选择。分析超过50,000美元的个人运行,我们贡献了以下三个点:(i)优化器性能在各个任务之间差异很大。 (ii)我们观察到,用默认参数评估多个优化器大致工作,并调整单个固定优化器的超参数。 (iii)尽管我们无法辨别出在所有测试任务中明确主导的优化方法,但我们确定了特定优化器和参数选择的子集的大大减少,通常会在我们的实验中带来竞争性的结果:亚当仍然是强大的竞争者,而新方法未能显着且一致地胜过它。我们的开源结果可作为具有挑战性且调整良好的基线,用于对新型优化方法进行更有意义的评估,而无需进行任何进一步的计算工作。

Choosing the optimizer is considered to be among the most crucial design decisions in deep learning, and it is not an easy one. The growing literature now lists hundreds of optimization methods. In the absence of clear theoretical guidance and conclusive empirical evidence, the decision is often made based on anecdotes. In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed heuristics. To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learning optimizers while giving a concise overview of the wide range of possible choices. Analyzing more than $50,000$ individual runs, we contribute the following three points: (i) Optimizer performance varies greatly across tasks. (ii) We observe that evaluating multiple optimizers with default parameters works approximately as well as tuning the hyperparameters of a single, fixed optimizer. (iii) While we cannot discern an optimization method clearly dominating across all tested tasks, we identify a significantly reduced subset of specific optimizers and parameter choices that generally lead to competitive results in our experiments: Adam remains a strong contender, with newer methods failing to significantly and consistently outperform it. Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods without requiring any further computational efforts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源