论文标题

美联社:用于去脱离修剪的神经网络的选择性激活

AP: Selective Activation for De-sparsifying Pruned Neural Networks

论文作者

Liu, Shiyu, Ghosh, Rohan, Tan, Dylan, Motani, Mehul

论文摘要

整流线性单元(RELU)是神经网络中非常成功的激活函数,因为它允许网络轻松获得稀疏表示,从而减少了过度参数化网络中的过度拟合。但是,在网络修剪中,我们发现Relu引入的稀疏性(通过称为动态死神经元率(DNR)的术语)对其进行了量化,对修剪的网络无济于事。有趣的是,将网络修剪越多,在优化过程中动态DNR的越小。这促使我们提出了一种明确减少修剪网络的动态DNR的方法,即消除网络。我们将我们的方法称为激活 - 预延伸(AP)。我们注意到,AP不能作为独立方法,因为它无法评估权重的重要性。取而代之的是,它与现有的修剪方法协同起作用,并旨在通过选择性激活节点来改善其性能以减少动态DNR。我们通过两种经典和三种最先进的修剪方法使用流行网络(例如Resnet,VGG)进行广泛的实验。公共数据集(例如CIFAR -10/100)上的实验结果表明,AP可以很好地与现有的修剪方法合作,并将性能提高3%至4%。对于大型数据集(例如,成像网)和最先进的网络(例如,视觉变压器),我们观察到AP的增长2%至3%,而不是没有。最后,我们进行了一项消融研究,以检查包含AP的组件的有效性。

The rectified linear unit (ReLU) is a highly successful activation function in neural networks as it allows networks to easily obtain sparse representations, which reduces overfitting in overparameterized networks. However, in network pruning, we find that the sparsity introduced by ReLU, which we quantify by a term called dynamic dead neuron rate (DNR), is not beneficial for the pruned network. Interestingly, the more the network is pruned, the smaller the dynamic DNR becomes during optimization. This motivates us to propose a method to explicitly reduce the dynamic DNR for the pruned network, i.e., de-sparsify the network. We refer to our method as Activating-while-Pruning (AP). We note that AP does not function as a stand-alone method, as it does not evaluate the importance of weights. Instead, it works in tandem with existing pruning methods and aims to improve their performance by selective activation of nodes to reduce the dynamic DNR. We conduct extensive experiments using popular networks (e.g., ResNet, VGG) via two classical and three state-of-the-art pruning methods. The experimental results on public datasets (e.g., CIFAR-10/100) suggest that AP works well with existing pruning methods and improves the performance by 3% - 4%. For larger scale datasets (e.g., ImageNet) and state-of-the-art networks (e.g., vision transformer), we observe an improvement of 2% - 3% with AP as opposed to without. Lastly, we conduct an ablation study to examine the effectiveness of the components comprising AP.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源