论文标题
传统和加速梯度下降用于神经建筑搜索
Traditional and accelerated gradient descent for neural architecture search
论文作者
论文摘要
在本文中,我们介绍了两位作者[5]的理论工作,介绍了两种用于神经体系结构搜索的算法(NASGD和NASAGD),这些算法使用了最佳传输的几何结构来介绍传统和加速梯度下降算法的新概念基础,以优化在半杂色的空间上功能的功能。我们的算法使用[2]中引入的网络形态框架作为基线,可以分析四十倍的架构与爬山方法[2,14]一样多,同时使用相同的计算资源和时间以及达到可比的准确性水平。例如,在CIFAR-10上使用NASGD,我们的方法设计和训练网络在仅在一个GPU上仅12小时内的错误率为4.06。
In this paper we introduce two algorithms for neural architecture search (NASGD and NASAGD) following the theoretical work by two of the authors [5] which used the geometric structure of optimal transport to introduce the conceptual basis for new notions of traditional and accelerated gradient descent algorithms for the optimization of a function on a semi-discrete space. Our algorithms, which use the network morphism framework introduced in [2] as a baseline, can analyze forty times as many architectures as the hill climbing methods [2, 14] while using the same computational resources and time and achieving comparable levels of accuracy. For example, using NASGD on CIFAR-10, our method designs and trains networks with an error rate of 4.06 in only 12 hours on a single GPU.