使用转移学习优化跨计算系统跨计算系统的性能

论文标题

使用转移学习优化跨计算系统跨计算系统的性能

Optimising the Performance of Convolutional Neural Networks across Computing Systems using Transfer Learning

论文作者

Mulder, Rik, Radu, Valentin, Dubach, Christophe

论文摘要

选择实现神经网络的卷积例程（原始）的选择对给定硬件平台上的推理性能（执行速度）产生了巨大影响。为了通过原始选择优化神经网络，可以为网络的每一层确定最佳原始性。此过程需要一个冗长的分析阶段，在每个层配置的所有可用原始图中都迭代，以测量其在目标平台上的执行时间。由于每个原始性都以不同的方式利用硬件，因此需要新的分析才能在移至另一个平台时获得最佳性能。在这项工作中，我们建议用基于机器学习的性能建模方法代替这个过于昂贵的分析阶段。我们的方法大大加快了优化时间。训练后，我们的性能模型可以估计任何一层配置中卷积基原始人的性能。通过原始选择优化大型神经网络执行的时间从小时到仅几秒钟。我们的性能模型很容易转移到其他目标平台。我们通过在Intel平台上训练性能模型，并将转移学习对AMD和ARM处理器设备进行最小化的样品进行转移学习来证明这一点。

The choice of convolutional routines (primitives) to implement neural networks has a tremendous impact on their inference performance (execution speed) on a given hardware platform. To optimise a neural network by primitive selection, the optimal primitive is identified for each layer of the network. This process requires a lengthy profiling stage, iterating over all the available primitives for each layer configuration, to measure their execution time on the target platform. Because each primitive exploits the hardware in different ways, new profiling is needed to obtain the best performance when moving to another platform. In this work, we propose to replace this prohibitively expensive profiling stage with a machine learning based approach of performance modeling. Our approach speeds up the optimisation time drastically. After training, our performance model can estimate the performance of convolutional primitives in any layer configuration. The time to optimise the execution of large neural networks via primitive selection is reduced from hours to just seconds. Our performance model is easily transferable to other target platforms. We demonstrate this by training a performance model on an Intel platform and performing transfer learning to AMD and ARM processor devices with minimal profiled samples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题