HOROVODRUNNER上的卷积神经网络和图形卷积网络的基准测试启用了火花簇

论文标题

HOROVODRUNNER上的卷积神经网络和图形卷积网络的基准测试启用了火花簇

Benchmark Tests of Convolutional Neural Network and Graph Convolutional Network on HorovodRunner Enabled Spark Clusters

论文作者

Pan, Jing, Liu, Wendao, Zhou, Jing

论文摘要

分布式深度学习任务快速迭代的自由对于较小的公司获得竞争优势和市场份额至关重要。 Horovodrunner将此过程带入相对可访问的火花簇。但是，没有关于HorovoDrunner本身的基准测试，也没有特别是图形卷积网络（GCN，以下简称GCN），并且对Horovod的可伸缩性基准测试非常有限，Horovod是需要自定义的GPU群集的前身。我们第一次表明，数据链球协会的HorovoDrunner在基于GPU和CPU群集的基于卷积神经网络（CNN，以下简称CNN）的缩放效率方面取得了显着提高，但不是原始的GCN任务。我们还首次在HorovoDrunner中实现了纠正的Adam优化器。

The freedom of fast iterations of distributed deep learning tasks is crucial for smaller companies to gain competitive advantages and market shares from big tech giants. HorovodRunner brings this process to relatively accessible spark clusters. There have been, however, no benchmark tests on HorovodRunner per se, nor specifically graph convolutional network (GCN, hereafter), and very limited scalability benchmark tests on Horovod, the predecessor requiring custom built GPU clusters. For the first time, we show that Databricks' HorovodRunner achieves significant lift in scaling efficiency for the convolutional neural network (CNN, hereafter) based tasks on both GPU and CPU clusters, but not the original GCN task. We also implemented the Rectified Adam optimizer for the first time in HorovodRunner.

下载PDF全文

下载文献需遵守相关版权规定

论文标题