卷积神经网络量化

论文标题

卷积神经网络量化

Convolutional Neural Networks Quantization with Attention

论文作者

Wu, Binyi, Waschneck, Bernd, Mayr, Christian Georg

论文摘要

已经证明，与在训练阶段使用32位浮点数相比，深度卷积神经网络（DCNN）可以在推理过程中以低精度运行，从而节省内存空间和功耗。但是，量化网络始终伴随着准确性降低。在这里，我们提出了一种方法，双阶段挤压和阈值（双阶段ST）。它使用注意机制来量化网络并获得最新的结果。使用我们的方法，3位模型可以实现超过完整精确基线模型准确性的准确性。提出的双阶段ST激活量化很容易应用：在卷积之前将其插入。

It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low precision during inference, thereby saving memory space and power consumption. However, quantizing networks is always accompanied by an accuracy decrease. Here, we propose a method, double-stage Squeeze-and-Threshold (double-stage ST). It uses the attention mechanism to quantize networks and achieve state-of-art results. Using our method, the 3-bit model can achieve accuracy that exceeds the accuracy of the full-precision baseline model. The proposed double-stage ST activation quantization is easy to apply: inserting it before the convolution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题