深神网络在分布式边缘设备上的分区和放置以最大化推理吞吐量

论文标题

深神网络在分布式边缘设备上的分区和放置以最大化推理吞吐量

Partitioning and Placement of Deep Neural Networks on Distributed Edge Devices to Maximize Inference Throughput

论文作者

Parthasarathy, Arjun, Krishnamachari, Bhaskar

论文摘要

边缘推理已经变得更加普遍，因为其不同的应用程序从零售到可穿戴技术不等。网络资源受限的边缘设备的簇越来越普遍，但是没有系统可以在这些群集中拆分DNN，同时最大程度地提高系统的推进吞吐量。我们提出了一种算法，该算法将DNN划分并在一组边缘设备上分配，以最大程度地减少瓶颈潜伏期，从而最大程度地提高推理吞吐量。该系统可以很好地缩放到不同节点内存能力和节点数量的系统。我们发现，在随机算法上可以将瓶颈潜伏期降低10倍，而在贪婪的联合分区排列算法上，我们可以将瓶颈潜伏期减少到35％。此外，我们从经验上发现，对于我们测试的一组代表性模型，该算法在最佳瓶颈潜伏期的9.2％以内产生结果。

Edge inference has become more widespread, as its diverse applications range from retail to wearable technology. Clusters of networked resource-constrained edge devices are becoming common, yet no system exists to split a DNN across these clusters while maximizing the inference throughput of the system. We present an algorithm which partitions DNNs and distributes them across a set of edge devices with the goal of minimizing the bottleneck latency and therefore maximizing inference throughput. The system scales well to systems of different node memory capacities and numbers of nodes. We find that we can reduce the bottleneck latency by 10x over a random algorithm and 35% over a greedy joint partitioning-placement algorithm. Furthermore we find empirically that for the set of representative models we tested, the algorithm produces results within 9.2% of the optimal bottleneck latency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题