使用替代Lagrangian放松，启用无再再训练深神经网络修剪

论文标题

使用替代Lagrangian放松，启用无再再训练深神经网络修剪

Enabling Retrain-free Deep Neural Network Pruning using Surrogate Lagrangian Relaxation

论文作者

Gurevin, Deniz, Zhou, Shanglin, Pepin, Lynn, Li, Bingbing, Bragin, Mikhail, Ding, Caiwen, Miao, Fei

论文摘要

网络修剪是一种广泛使用的技术，可降低深度神经网络的计算成本和模型大小。但是，典型的三阶段管道，即训练，修剪和再培训（微调）大大增加了整体训练轨迹。在本文中，我们基于替代拉格朗日放松（SLR）开发了一种系统的预延性优化方法，该方法是为了克服因避免权重问题的离散性质而造成的困难而定制的，同时确保了快速收敛。我们通过使用二次惩罚进一步加速了SLR的收敛性。与其他最新方法获得的模型参数相比，SLR在训练阶段获得的模型参数更接近其最佳值。 We evaluate the proposed method on image classification tasks, i.e., ResNet-18 and ResNet-50 using ImageNet, and ResNet-18, ResNet-50 and VGG-16 using CIFAR-10, as well as object detection tasks, i.e., YOLOv3 and YOLOv3-tiny using COCO 2014 and Ultra-Fast-Lane-Detection using TuSimple lane detection dataset.实验结果表明，在相同的精度要求下，我们基于SLR的重量优化方法比最新的压缩率更高。即使在不进行重新培训的情况下，它也可以达到高模型的准确性（将传统的三阶段修剪降低到两个阶段）。鉴于预算有限的再培训时期，我们的方法迅速恢复了模型的准确性。

Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline, i.e., training, pruning and retraining (fine-tuning) significantly increases the overall training trails. In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation (SLR), which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem while ensuring fast convergence. We further accelerate the convergence of the SLR by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate the proposed method on image classification tasks, i.e., ResNet-18 and ResNet-50 using ImageNet, and ResNet-18, ResNet-50 and VGG-16 using CIFAR-10, as well as object detection tasks, i.e., YOLOv3 and YOLOv3-tiny using COCO 2014 and Ultra-Fast-Lane-Detection using TuSimple lane detection dataset. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves higher compression rate than state-of-the-arts under the same accuracy requirement. It also achieves a high model accuracy even at the hard-pruning stage without retraining (reduces the traditional three-stage pruning to two-stage). Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题