论文标题
注意神经架构搜索性能评估中的噪音
Heed the Noise in Performance Evaluations in Neural Architecture Search
论文作者
论文摘要
神经建筑搜索(NAS)最近已成为引起人们极大兴趣的话题。但是,NAS中存在一个潜在的影响力问题,它在很大程度上无法识别:噪声。由于神经网络初始化,训练和选定的火车/验证数据集拆分中的随机因素,神经网络体系结构的性能评估通常基于单个学习运行,也是随机的。如果数据集很小,这可能会产生特别大的影响。因此,我们建议通过使用不同的随机种子和交叉验证来评估基于多个网络训练运行的平均性能来减少这种噪音。我们对NAS的组合优化制定进行实验,其中我们会改变降噪水平。在网络培训运行方面,我们对每个噪声水平使用相同的计算预算,即,在平均进行更多培训运行时,我们允许更少的体系结构评估。考虑了多种搜索算法,包括通常在NAS方面表现良好的进化算法。我们在医疗图像分割域中使用两个公开可用的数据集,其中数据集通常受到限制,样品之间的可变性通常很高。我们的结果表明,减少体系结构评估中的噪声使所有考虑的搜索算法都可以找到更好的体系结构。
Neural Architecture Search (NAS) has recently become a topic of great interest. However, there is a potentially impactful issue within NAS that remains largely unrecognized: noise. Due to stochastic factors in neural network initialization, training, and the chosen train/validation dataset split, the performance evaluation of a neural network architecture, which is often based on a single learning run, is also stochastic. This may have a particularly large impact if a dataset is small. We therefore propose to reduce this noise by evaluating architectures based on average performance over multiple network training runs using different random seeds and cross-validation. We perform experiments for a combinatorial optimization formulation of NAS in which we vary noise reduction levels. We use the same computational budget for each noise level in terms of network training runs, i.e., we allow less architecture evaluations when averaging over more training runs. Multiple search algorithms are considered, including evolutionary algorithms which generally perform well for NAS. We use two publicly available datasets from the medical image segmentation domain where datasets are often limited and variability among samples is often high. Our results show that reducing noise in architecture evaluations enables finding better architectures by all considered search algorithms.