使用压缩感测训练稀疏的神经网络

论文标题

使用压缩感测训练稀疏的神经网络

Training Sparse Neural Networks using Compressed Sensing

论文作者

Siegel, Jonathan W., Chen, Jianhong, Zhang, Pengchuan, Xu, Jinchao

论文摘要

修剪神经网络的权重是一种有效且广泛使用的技术，可降低模型大小和推理复杂性。我们基于压缩感测的新方法开发并测试一种新型方法，该方法将修剪和训练结合到一个步骤中。具体来说，我们在训练过程中使用了自适应加权的$ \ ell^1 $罚款，我们将其与正则双重平均（RDA）算法的概括相结合，以训练稀疏的神经网络。我们介绍的自适应加权对应于基于权重绝对值的对数的新颖正规化学剂。我们进行一系列消融研究，证明了自适应加权和广义RDA算法提供的改进。此外，在CIFAR-10，CIFAR-100和Imagenet数据集上进行的数值实验表明，我们的方法1）比现有的最新方法更稀疏，更准确的网络； 2）可用于从头开始训练稀疏的网络，即从随机初始化，而不是用训练有素的基础模型初始化； 3）充当有效的正规化器，提高了概括精度。

Pruning the weights of neural networks is an effective and widely-used technique for reducing model size and inference complexity. We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step. Specifically, we utilize an adaptively weighted $\ell^1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks. The adaptive weighting we introduce corresponds to a novel regularizer based on the logarithm of the absolute value of the weights. We perform a series of ablation studies demonstrating the improvement provided by the adaptive weighting and generalized RDA algorithm. Furthermore, numerical experiments on the CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that our method 1) trains sparser, more accurate networks than existing state-of-the-art methods; 2) can be used to train sparse networks from scratch, i.e. from a random initialization, as opposed to initializing with a well-trained base model; 3) acts as an effective regularizer, improving generalization accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题