论文标题
神经网络的自动稀疏连接学习
Automatic Sparse Connectivity Learning for Neural Networks
论文作者
论文摘要
由于稀疏的神经网络通常包含许多零重量,因此在不降低网络性能的情况下,可以消除这些不必要的网络连接。因此,精心设计的稀疏神经网络具有显着减少拖船和计算资源的潜力。在这项工作中,我们提出了一种新的自动修剪方法 - 稀疏连接学习(SCL)。具体而言,重量被重新参数为可训练重量变量和二进制掩码的元素乘法。因此,网络连接由二进制掩码完全描述,该掩码由单位步骤函数调节。从理论上讲,我们证明了使用直通估算器(Ste)进行网络修剪的基本原则。该原则是,Ste的代理梯度应该是正面的,以确保掩模变量在其最小值上汇聚。在发现泄漏的恢复后,软体和身份stes可以满足这一原则后,我们建议采用SCL中的身份,以进行离散的掩模放松。我们发现,不同特征的掩模梯度非常不平衡,因此,我们建议将每个功能的掩模梯度归一化,以优化蒙版变量训练。为了自动训练稀疏面具,我们将网络连接的总数作为正规化术语包括在我们的目标函数中。由于SCL不需要由设计人员定义的网络层定义的修剪标准或超参数,因此在较大的假设空间中探索网络以实现优化的稀疏连接,以获得最佳性能。 SCL克服了现有自动修剪方法的局限性。实验结果表明,SCL可以自动学习并为各种基线网络结构选择重要的网络连接。由SCL训练的深度学习模型胜过SOTA人类设计的稀疏性,准确性和降低的自动修剪方法。
Since sparse neural networks usually contain many zero weights, these unnecessary network connections can potentially be eliminated without degrading network performance. Therefore, well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources. In this work, we propose a new automatic pruning method - Sparse Connectivity Learning (SCL). Specifically, a weight is re-parameterized as an element-wise multiplication of a trainable weight variable and a binary mask. Thus, network connectivity is fully described by the binary mask, which is modulated by a unit step function. We theoretically prove the fundamental principle of using a straight-through estimator (STE) for network pruning. This principle is that the proxy gradients of STE should be positive, ensuring that mask variables converge at their minima. After finding Leaky ReLU, Softplus, and Identity STEs can satisfy this principle, we propose to adopt Identity STE in SCL for discrete mask relaxation. We find that mask gradients of different features are very unbalanced, hence, we propose to normalize mask gradients of each feature to optimize mask variable training. In order to automatically train sparse masks, we include the total number of network connections as a regularization term in our objective function. As SCL does not require pruning criteria or hyper-parameters defined by designers for network layers, the network is explored in a larger hypothesis space to achieve optimized sparse connectivity for the best performance. SCL overcomes the limitations of existing automatic pruning methods. Experimental results demonstrate that SCL can automatically learn and select important network connections for various baseline network structures. Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.