论文标题

通过线性过度参数化提高修剪的网络

Boosting Pruned Networks with Linear Over-parameterization

论文作者

Qian, Yu, Cao, Jian, Li, Xiaoshuang, Zhang, Jie, Li, Hufei, Chen, Jue

论文摘要

结构化修剪通过减少快速推理的通道(过滤器)来压缩神经网络,并且在运行时足迹较低。为了恢复修剪后的准确性,通常将微调应用于修剪的网络。但是,修剪网络中剩余的参数很少,不可避免地给微调带来了巨大的挑战以恢复准确性。为了应对这一挑战,我们提出了一种新颖的方法,该方法首先线性地过度参数化了修剪网络中的紧凑层,以扩大微调参数的数量,然后在微调后将其重新分配给原始层。具体而言,我们与几个连续的卷积/线性层相等地扩展了卷积/线性层,这些卷积/线性层不会改变当前的输出特征图。此外,我们利用具有相似性的知识蒸馏,鼓励过度参数化的块来学习相应密集层的直接数据对数据相似性,以保持其功能学习能力。对CIFAR-10和Imagenet进行了全面评估所提出的方法,该方法的表现明显优于香草微调策略,尤其是对于较大的修剪比率。

Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. However, too few remaining parameters in pruned networks inevitably bring a great challenge to fine-tuning to restore accuracy. To address this challenge, we propose a novel method that first linearly over-parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters and then re-parameterizes them to the original layers after fine-tuning. Specifically, we equivalently expand the convolution/linear layer with several consecutive convolution/linear layers that do not alter the current output feature maps. Furthermore, we utilize similarity-preserving knowledge distillation that encourages the over-parameterized block to learn the immediate data-to-data similarities of the corresponding dense layer to maintain its feature learning ability. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet which significantly outperforms the vanilla fine-tuning strategy, especially for large pruning ratio.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源