QAPPA：DNN加速器的量化能力，性能和区域建模

论文标题

QAPPA：DNN加速器的量化能力，性能和区域建模

QAPPA: Quantization-Aware Power, Performance, and Area Modeling of DNN Accelerators

论文作者

Inci, Ahmet, Virupaksha, Siri Garudanagiri, Jain, Aman, Thallam, Venkata Vivek, Ding, Ruizhou, Marculescu, Diana

论文摘要

随着机器学习和系统社区努力通过定制的DNN加速器和模型压缩技术来实现更高的能源效率，因此需要建立设计空间探索框架，该框架将量化的处理元素纳入加速器设计空间，同时具有准确，快速的功能，性能，性能和区域模型。在这项工作中，我们提出了Qappa，这是DNN加速器的高度参数化的量化功率，性能和区域建模框架。我们的框架可以促进对DNN加速器设计空间探索的未来研究，以提供各种设计选择，例如位精度，处理元素类型，处理元素的刮擦尺寸，全球缓冲区大小，设备带宽，设计中的总处理元素的数量和DNN工作量。我们的结果表明，不同的精确度和处理元素类型会导致每个区域和能量性能方面的显着差异。具体而言，与基于INT16的实施相比，我们提议的轻巧处理元素每个区域的性能高达4.9倍，并提高能源。

As the machine learning and systems community strives to achieve higher energy-efficiency through custom DNN accelerators and model compression techniques, there is a need for a design space exploration framework that incorporates quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QAPPA, a highly parameterized quantization-aware power, performance, and area modeling framework for DNN accelerators. Our framework can facilitate the future research on design space exploration of DNN accelerators for various design choices such as bit precision, processing element type, scratchpad sizes of processing elements, global buffer size, device bandwidth, number of total processing elements in the the design, and DNN workloads. Our results show that different bit precisions and processing element types lead to significant differences in terms of performance per area and energy. Specifically, our proposed lightweight processing elements achieve up to 4.9x more performance per area and energy improvement when compared to INT16 based implementation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题