记忆计算体系结构中大量平行类似物的端到端DNN推断

论文标题

记忆计算体系结构中大量平行类似物的端到端DNN推断

End-to-End DNN Inference on a Massively Parallel Analog In Memory Computing Architecture

论文作者

Bruschi, Nazareno, Tagliavini, Giuseppe, Garofalo, Angelo, Conti, Francesco, Boybat, Irem, Benini, Luca, Rossi, Davide

论文摘要

卷积神经网络（CNN）应用的计算资源和能源效率的需求需要新的范式来克服“记忆墙”。模拟中内存计算（AIMC）是一个有希望的范式，因为它执行了矩阵矢量乘法，这是许多ML应用的关键核，在模拟域中的模拟阵列中，在模拟阵列中构成了作为存储单元的横梁。但是，有几个因素限制了该技术的全部开发，包括横梁设备的物理制造，这限制了单个阵列的内存能力。已经提出了多AIMC架构来克服这一限制，但仅用于微小和定制的CNN或在片芯外进行一些层。在这项工作中，我们介绍了512个群集的异质体系结构的端到端Resnet-18 DNN的完整推断，结合了AIMC核心和数字RISC-V内核的混合，达到了20.2个顶部。此外，我们分析了网络在可用的非易失性单元上的映射，将其与最新模型进行比较，并根据AIMC设备得出下一代多核体系结构的准则。

The demand for computation resources and energy efficiency of Convolutional Neural Networks (CNN) applications requires a new paradigm to overcome the "Memory Wall". Analog In-Memory Computing (AIMC) is a promising paradigm since it performs matrix-vector multiplications, the critical kernel of many ML applications, in-place in the analog domain within memory arrays structured as crossbars of memory cells. However, several factors limit the full exploitation of this technology, including the physical fabrication of the crossbar devices, which constrain the memory capacity of a single array. Multi-AIMC architectures have been proposed to overcome this limitation, but they have been demonstrated only for tiny and custom CNNs or performing some layers off-chip. In this work, we present the full inference of an end-to-end ResNet-18 DNN on a 512-cluster heterogeneous architecture coupling a mix of AIMC cores and digital RISC-V cores, achieving up to 20.2 TOPS. Moreover, we analyze the mapping of the network on the available non-volatile cells, compare it with state-of-the-art models, and derive guidelines for next-generation many-core architectures based on AIMC devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题