论文标题
了解和优化超参数调整的包装神经网络培训
Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning
论文作者
论文摘要
随着神经网络越来越多地用于机器学习实践,如何在各种模型培训任务中有效共享有限的培训资源成为一个关键问题。为了更好地利用共享资源,我们探讨了本文单个GPU上共同培训多个神经网络模型的想法。我们通过提出一个名为Pack的原始原始性来意识到这个想法。我们进一步介绍了对包装和端到端实验的全面经验研究,该研究表明高参数调整的显着改善。结果表明:(1)包装两种型号可以比单个训练步骤的未包装设置提高性能提高40%,并且在包装更多型号时会提高改进; (2)包装原始的好处在很大程度上取决于许多因素,包括记忆容量,芯片架构,神经网络结构和批处理大小; (3)在有限的资源上训练多个神经网络模型时,包装和解开包装之间存在权衡; (4)一个包装感知的超频带比原始超频带快2.7倍,随着记忆尺寸的增加,这种改进的增长,随后堆积的模型密度。
As neural networks are increasingly employed in machine learning practice, how to efficiently share limited training resources among a diverse set of model training tasks becomes a crucial issue. To achieve better utilization of the shared resources, we explore the idea of jointly training multiple neural network models on a single GPU in this paper. We realize this idea by proposing a primitive, called pack. We further present a comprehensive empirical study of pack and end-to-end experiments that suggest significant improvements for hyperparameter tuning. The results suggest: (1) packing two models can bring up to 40% performance improvement over unpacked setups for a single training step and the improvement increases when packing more models; (2) the benefit of the pack primitive largely depends on a number of factors including memory capacity, chip architecture, neural network structure, and batch size; (3) there exists a trade-off between packing and unpacking when training multiple neural network models on limited resources; (4) a pack-aware Hyperband is up to 2.7x faster than the original Hyperband, with this improvement growing as memory size increases and subsequently the density of models packed.