通过可学习的缩放因素进行全面的在线网络修剪

论文标题

通过可学习的缩放因素进行全面的在线网络修剪

Comprehensive Online Network Pruning via Learnable Scaling Factors

论文作者

Haider, Muhammad Umair, Taj, Murtaza

论文摘要

部署深神经网络体系结构的主要挑战之一是它们的大小对其推论时间和记忆需求产生不利影响。深层CNN可以通过卸下滤清器的重要性或深度来通过删除层和块来清除宽度。宽度明智的修剪（过滤器修剪）通常是通过可学习的门或交换机和稀疏正规化来执行的，而迄今为止，通过手动设计通常称为学生网络的较小网络，对图层进行修剪。我们提出了一种全面的修剪策略，可以在宽度和深度修剪方面执行。这是通过在不同粒度（神经元，滤波器，层，块）下引入门来实现的，然后通过目标函数控制，该目标函数在每个正向通行过程中同时在不同的粒度上进行修剪。我们的方法适用于在空间维度或连接类型（顺序，残留，并行或启动）上没有任何约束的体系结构的广泛变化。在基准数据集中评估时，我们的方法导致70％至90％的压缩率在准确性上没有明显损失。

One of the major challenges in deploying deep neural network architectures is their size which has an adverse effect on their inference time and memory requirements. Deep CNNs can either be pruned width-wise by removing filters based on their importance or depth-wise by removing layers and blocks. Width wise pruning (filter pruning) is commonly performed via learnable gates or switches and sparsity regularizers whereas pruning of layers has so far been performed arbitrarily by manually designing a smaller network usually referred to as a student network. We propose a comprehensive pruning strategy that can perform both width-wise as well as depth-wise pruning. This is achieved by introducing gates at different granularities (neuron, filter, layer, block) which are then controlled via an objective function that simultaneously performs pruning at different granularity during each forward pass. Our approach is applicable to wide-variety of architectures without any constraints on spatial dimensions or connection type (sequential, residual, parallel or inception). Our method has resulted in a compression ratio of 70% to 90% without noticeable loss in accuracy when evaluated on benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题