精确门控：通过动态双精度激活提高神经网络效率

论文标题

精确门控：通过动态双精度激活提高神经网络效率

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

论文作者

Zhang, Yichi, Zhao, Ritchie, Hua, Weizhe, Xu, Nayun, Suh, G. Edward, Zhang, Zhiru

论文摘要

我们提出了Precision Gating（PG），这是一种针对深神经网络的端到端可训练的动态双精度量化技术。 PG以低精度计算大多数功能，并且只有一小部分重要功能以更高的精度来保持准确性。所提出的方法适用于多种DNN架构，并显着降低了DNN执行的计算成本，几乎没有准确的损失。我们的实验表明，PG在CNN上取得了出色的成果，包括静态压缩的移动友好网络，例如Shufflenet。与最先进的基于预测的量化方案相比，PG在ImageNet上以2.4 $ \ tims $ $ $ $ \ times $的计算达到相同或更高的精度。 pg此外，适用于RNN。与8位统一的量化相比，PG的每句话的困惑性提高了1.2％，而Penn Tree Bank数据集的LSTM上的计算成本降低为2.7 $ \ times $。代码可在以下网址找到：https：//github.com/cornell-zhang/dnn-gating

We propose precision gating (PG), an end-to-end trainable dynamic dual-precision quantization technique for deep neural networks. PG computes most features in a low precision and only a small proportion of important features in a higher precision to preserve accuracy. The proposed approach is applicable to a variety of DNN architectures and significantly reduces the computational cost of DNN execution with almost no accuracy loss. Our experiments indicate that PG achieves excellent results on CNNs, including statically compressed mobile-friendly networks such as ShuffleNet. Compared to the state-of-the-art prediction-based quantization schemes, PG achieves the same or higher accuracy with 2.4$\times$ less compute on ImageNet. PG furthermore applies to RNNs. Compared to 8-bit uniform quantization, PG obtains a 1.2% improvement in perplexity per word with 2.7$\times$ computational cost reduction on LSTM on the Penn Tree Bank dataset. Code is available at: https://github.com/cornell-zhang/dnn-gating

下载PDF全文

下载文献需遵守相关版权规定

论文标题