论文标题
学会的阈值修剪
Learned Threshold Pruning
论文作者
论文摘要
本文提出了一种新型的可区分方法,用于对深神经网络的非结构化重量修剪。我们学到的 - 阈值修剪(LTP)方法通过梯度下降来学习人均阈值,这与将其设置为输入的常规方法不同。使阈值可训练还可以使LTP计算有效,因此可以扩展到更深的网络。例如,LTP需要$ 30 $的时代才能在Imagenet上修复Resnet50 $ 9.1 $。这与其他方法相反,这些方法通过计算密集的迭代修剪和微调过程来搜索人均阈值。此外,有了新颖的可区分$ L_0 $正则化,LTP能够在具有批处理差异化的架构上有效运行。这很重要,因为$ l_1 $和$ l_2 $罚款在批处理范围的网络中失去了正规化效果。最后,LTP生成了一系列逐渐稀疏的网络,可以根据稀疏和性能要求从中挑选所需的修剪网络。这些功能使LTP可以在Alexnet($ 26.4 \ times $压缩$ 79.1 \%$ $ $ TOP-5的精度)和Resnet50($ 9.1 \ times $压缩$ 92.0 \%\%$ $ $ $ $ $ $ $ $ $ $ 5的精度)上实现ImageNet网络的竞争压缩率。我们还表明,LTP有效地预处了Modern \ textit {compact}架构,例如EdgitionNet,MobilenetV2和MixNet。
This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes $30$ epochs for LTP to prune ResNet50 on ImageNet by a factor of $9.1$. This is in contrast to other methods that search for per-layer thresholds via a computationally intensive iterative pruning and fine-tuning process. Additionally, with a novel differentiable $L_0$ regularization, LTP is able to operate effectively on architectures with batch-normalization. This is important since $L_1$ and $L_2$ penalties lose their regularizing effect in networks with batch-normalization. Finally, LTP generates a trail of progressively sparser networks from which the desired pruned network can be picked based on sparsity and performance requirements. These features allow LTP to achieve competitive compression rates on ImageNet networks such as AlexNet ($26.4\times$ compression with $79.1\%$ Top-5 accuracy) and ResNet50 ($9.1\times$ compression with $92.0\%$ Top-5 accuracy). We also show that LTP effectively prunes modern \textit{compact} architectures, such as EfficientNet, MobileNetV2 and MixNet.