使用深度压缩的基于皮层的微控制器上深度学习应用程序的节能部署

论文标题

使用深度压缩的基于皮层的微控制器上深度学习应用程序的节能部署

Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression

论文作者

Deutel, Mark, Woller, Philipp, Mutschler, Christopher, Teich, Jürgen

论文摘要

大型深神经网络（DNN）是当今人工智能的骨干，因为它们在大型数据集中接受培训时可以进行准确的预测。随着进步的技术（例如物联网），解释传感器生成的大量数据已成为越来越重要的任务。但是，在许多应用中，不仅预测性能，而且深度学习模型的能源消耗也引起了主要兴趣。本文通过网络压缩研究了有关资源约束的微控制器体系结构的有效部署。我们提出了一种系统探索不同DNN修剪，量化和部署策略的方法，以基于不同ARM Cortex-M的低功率系统。该探索允许分析关键指标（例如准确性，记忆消耗，执行时间和功耗）之间的权衡。我们讨论了三种不同DNN体系结构的实验结果，并表明我们可以在预测质量降低之前将它们压缩到其原始参数计数的10 \％以下。这也使我们能够在基于Cortex-M的微控制器上部署和评估它们。

Large Deep Neural Networks (DNNs) are the backbone of today's artificial intelligence due to their ability to make accurate predictions when being trained on huge datasets. With advancing technologies, such as the Internet of Things, interpreting large quantities of data generated by sensors is becoming an increasingly important task. However, in many applications not only the predictive performance but also the energy consumption of deep learning models is of major interest. This paper investigates the efficient deployment of deep learning models on resource-constrained microcontroller architectures via network compression. We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies, targeting different ARM Cortex-M based low-power systems. The exploration allows to analyze trade-offs between key metrics such as accuracy, memory consumption, execution time, and power consumption. We discuss experimental results on three different DNN architectures and show that we can compress them to below 10\% of their original parameter count before their predictive quality decreases. This also allows us to deploy and evaluate them on Cortex-M based microcontrollers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题