CNN推理加速度的跨滤波器压缩

论文标题

CNN推理加速度的跨滤波器压缩

Cross-filter compression for CNN inference acceleration

论文作者

Lyu, Fuyuan, Zhu, Shien, Liu, Weichen

论文摘要

卷积神经网络展示了多个任务（例如图像分类和许多其他任务）的重要能力。但是，培训网络需要大量资源。因此，通过降低重量，激活和梯度的精度来加速神经网络。但是，这些滤波器定量方法存在于核的大小引起的自然上限。同时，随着小内核的流行，自然极限进一步降低。为了解决此问题，我们提出了一种新的跨滤波器压缩方法，该方法可以提供$ \ sim32 \ times $ $内存储蓄和$ 122 \ times $ $ $ speed在卷积操作中。在我们的方法中，将所有卷积过滤器量化为给定的位，并且在空间相邻的过滤器具有相同的缩放系数。我们基于二进制重量和XNOR-NET的压缩方法在CIFAR-10和Imagenet数据集上进行了评估，该数据集具有广泛使用的网络结构，例如ResNet和VGG，以及与最先进的量化方法相比，与证人的可忍受精度损失相比。

Convolution neural network demonstrates great capability for multiple tasks, such as image classification and many others. However, much resource is required to train a network. Hence much effort has been made to accelerate neural network by reducing precision of weights, activation, and gradient. However, these filter-wise quantification methods exist a natural upper limit, caused by the size of the kernel. Meanwhile, with the popularity of small kernel, the natural limit further decrease. To address this issue, we propose a new cross-filter compression method that can provide $\sim32\times$ memory savings and $122\times$ speed up in convolution operations. In our method, all convolution filters are quantized to given bits and spatially adjacent filters share the same scaling factor. Our compression method, based on Binary-Weight and XNOR-Net separately, is evaluated on CIFAR-10 and ImageNet dataset with widely used network structures, such as ResNet and VGG, and witness tolerable accuracy loss compared to state-of-the-art quantification methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题