Diffprune：带有确定性近似二进制门的神经网络修剪和$ L_0 $正则化

论文标题

Diffprune：带有确定性近似二进制门的神经网络修剪和$ L_0 $正则化

DiffPrune: Neural Network Pruning with Deterministic Approximate Binary Gates and $L_0$ Regularization

论文作者

Shulman, Yaniv

论文摘要

现代神经网络体系结构通常具有数百万个参数，并且可以显着修剪，而无需实质性损失，这表明它们被过度参数化。这项工作的贡献是两个方面。第一个是一种通过确定性和可区分的变换的任何实用值的多元随机变量来近似多变量Bernoulli随机变量的方法。第二个是通过元素逐步乘法的模型选择的方法，该参数具有近似二进制门，可以确定性地或随机计算，并具有精确的零值。将替代正规化纳入$ L_0 $损失的鼓励鼓励了稀疏性。由于该方法是可区分的，因此可以通过随机梯度下降的经验风险最小化程序来直接有效地学习模型体系结构，理论上可以在训练过程中进行条件计算。该方法还支持对参数或激活的任何任意组稀疏性，因此为非结构化或灵活的结构化模型修剪提供了一个框架。为结论进行实验以证明所提出的方法的有效性。

Modern neural network architectures typically have many millions of parameters and can be pruned significantly without substantial loss in effectiveness which demonstrates they are over-parameterized. The contribution of this work is two-fold. The first is a method for approximating a multivariate Bernoulli random variable by means of a deterministic and differentiable transformation of any real-valued multivariate random variable. The second is a method for model selection by element-wise multiplication of parameters with approximate binary gates that may be computed deterministically or stochastically and take on exact zero values. Sparsity is encouraged by the inclusion of a surrogate regularization to the $L_0$ loss. Since the method is differentiable it enables straightforward and efficient learning of model architectures by an empirical risk minimization procedure with stochastic gradient descent and theoretically enables conditional computation during training. The method also supports any arbitrary group sparsity over parameters or activations and therefore offers a framework for unstructured or flexible structured model pruning. To conclude experiments are performed to demonstrate the effectiveness of the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题