通过可熔的残留卷积块修剪深神经网络

论文标题

通过可熔的残留卷积块修剪深神经网络

Layer Pruning via Fusible Residual Convolutional Block for Deep Neural Networks

论文作者

Xu, Pengtao, Cao, Jian, Shang, Fanhua, Sun, Wenyu, Li, Pu

论文摘要

为了在资源有限的设备上部署深层卷积神经网络（CNN），已经开发了许多用于过滤器和权重的模型修剪方法，而只有少数几个可以整理修剪。但是，与过滤器修剪和权重修剪相比，当相同的插槽和参数数量被修剪时，通过层修剪获得的紧凑模型的推理时间和运行时内存使用较小，因为在内存中的数据较少。在本文中，我们建议使用可熔的残留卷积块（RESCONV）提出一种简单的层修剪方法，该方法是通过将快捷方式连接使用可训练的信息控制参数插入单个卷积层来实现的。在训练中使用RESCONV结构可以提高网络的准确性并训练深度平原网络，并且在推理过程中没有增加其他计算，因为RESCONV融合为训练后的普通卷积层。对于图层修剪，我们将网络的卷积层转换为具有层缩放系数的RESCONV。在训练过程中，采用L1正则化以使缩放因子稀疏，因此自动识别不重要的层然后删除，从而导致了层减少的模型。我们的修剪方法在不同数据集上的最新技术上实现了出色的压缩和加速性能，并且在较低的修剪率的情况下不需要再培训。例如，使用RESNET-110，我们通过去除55.5％的参数来减少65.5％的范围，而CIFAR-10上的TOP-1准确性仅略有0.13％。

In order to deploy deep convolutional neural networks (CNNs) on resource-limited devices, many model pruning methods for filters and weights have been developed, while only a few to layer pruning. However, compared with filter pruning and weight pruning, the compact model obtained by layer pruning has less inference time and run-time memory usage when the same FLOPs and number of parameters are pruned because of less data moving in memory. In this paper, we propose a simple layer pruning method using fusible residual convolutional block (ResConv), which is implemented by inserting shortcut connection with a trainable information control parameter into a single convolutional layer. Using ResConv structures in training can improve network accuracy and train deep plain networks, and adds no additional computation during inference process because ResConv is fused to be an ordinary convolutional layer after training. For layer pruning, we convert convolutional layers of network into ResConv with a layer scaling factor. In the training process, the L1 regularization is adopted to make the scaling factors sparse, so that unimportant layers are automatically identified and then removed, resulting in a model of layer reduction. Our pruning method achieves excellent performance of compression and acceleration over the state-of-the-arts on different datasets, and needs no retraining in the case of low pruning rate. For example, with ResNet-110, we achieve a 65.5%-FLOPs reduction by removing 55.5% of the parameters, with only a small loss of 0.13% in top-1 accuracy on CIFAR-10.

下载PDF全文

下载文献需遵守相关版权规定

论文标题