使用组信封正则化学习K级结构化稀疏神经网络

论文标题

使用组信封正则化学习K级结构化稀疏神经网络

Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization

论文作者

Refael, Yehonathan, Arbel, Iftach, Huleihel, Wasim

论文摘要

对计算资源的广泛需求为在具有约束资源的设备上部署大规模的深神经网络（DNN）带来了重要的障碍。同时，研究表明，大量这些DNN参数是冗余和无关的。在本文中，我们介绍了一种学习结构化稀疏神经网络的新方法，旨在弥合DNN硬件部署挑战。我们开发了一种新型的正则化技术，称为加权组稀疏包络功能（WGSEF），概括了稀疏的信封功能（SEF），以选择（或无效）神经元组，从而降低冗余并提高计算效率。该方法加快了推理时间，并旨在减少记忆需求和功耗，这要归功于其适应性，任何硬件都可以指定组定义，例如过滤器，通道，过滤器形状，层深度，单个参数（非结构化）。在冗余参数的情况下，该方法保持网络准确性降解可忽略不计，甚至可以提高准确性。我们的方法有效地计算了WGSEF正常器及其近端操作员，相对于组变量数量，最差的线性复杂性。它采用基于近端梯度的优化技术，用于训练模型，可以解决包含神经网络损失和WGSEF的非凸极小问题。最后，我们从压缩比，准确性和推理潜伏期来实验并说明了我们提出的方法的效率。

The extensive need for computational resources poses a significant obstacle to deploying large-scale Deep Neural Networks (DNN) on devices with constrained resources. At the same time, studies have demonstrated that a significant number of these DNN parameters are redundant and extraneous. In this paper, we introduce a novel approach for learning structured sparse neural networks, aimed at bridging the DNN hardware deployment challenges. We develop a novel regularization technique, termed Weighted Group Sparse Envelope Function (WGSEF), generalizing the Sparse Envelop Function (SEF), to select (or nullify) neuron groups, thereby reducing redundancy and enhancing computational efficiency. The method speeds up inference time and aims to reduce memory demand and power consumption, thanks to its adaptability which lets any hardware specify group definitions, such as filters, channels, filter shapes, layer depths, a single parameter (unstructured), etc. The properties of the WGSEF enable the pre-definition of a desired sparsity level to be achieved at the training convergence. In the case of redundant parameters, this approach maintains negligible network accuracy degradation or can even lead to improvements in accuracy. Our method efficiently computes the WGSEF regularizer and its proximal operator, in a worst-case linear complexity relative to the number of group variables. Employing a proximal-gradient-based optimization technique, to train the model, it tackles the non-convex minimization problem incorporating the neural network loss and the WGSEF. Finally, we experiment and illustrate the efficiency of our proposed method in terms of the compression ratio, accuracy, and inference latency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题