通过线性过度参数化提高修剪的网络

论文标题

通过线性过度参数化提高修剪的网络

Boosting Pruned Networks with Linear Over-parameterization

论文作者

Qian, Yu, Cao, Jian, Li, Xiaoshuang, Zhang, Jie, Li, Hufei, Chen, Jue

论文摘要

结构化修剪通过减少快速推理的通道（过滤器）来压缩神经网络，并且在运行时足迹较低。为了恢复修剪后的准确性，通常将微调应用于修剪的网络。但是，修剪网络中剩余的参数很少，不可避免地给微调带来了巨大的挑战以恢复准确性。为了应对这一挑战，我们提出了一种新颖的方法，该方法首先线性地过度参数化了修剪网络中的紧凑层，以扩大微调参数的数量，然后在微调后将其重新分配给原始层。具体而言，我们与几个连续的卷积/线性层相等地扩展了卷积/线性层，这些卷积/线性层不会改变当前的输出特征图。此外，我们利用具有相似性的知识蒸馏，鼓励过度参数化的块来学习相应密集层的直接数据对数据相似性，以保持其功能学习能力。对CIFAR-10和Imagenet进行了全面评估所提出的方法，该方法的表现明显优于香草微调策略，尤其是对于较大的修剪比率。

Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. However, too few remaining parameters in pruned networks inevitably bring a great challenge to fine-tuning to restore accuracy. To address this challenge, we propose a novel method that first linearly over-parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters and then re-parameterizes them to the original layers after fine-tuning. Specifically, we equivalently expand the convolution/linear layer with several consecutive convolution/linear layers that do not alter the current output feature maps. Furthermore, we utilize similarity-preserving knowledge distillation that encourages the over-parameterized block to learn the immediate data-to-data similarities of the corresponding dense layer to maintain its feature learning ability. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet which significantly outperforms the vanilla fine-tuning strategy, especially for large pruning ratio.

下载PDF全文

下载文献需遵守相关版权规定

论文标题