N：M结构稀疏性的训练配方，带有腐烂的修剪口罩

论文标题

N：M结构稀疏性的训练配方，带有腐烂的修剪口罩

Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

论文作者

Kao, Sheng-Chun, Yazdanbakhsh, Amir, Subramanian, Suvinay, Agrawal, Shivani, Evci, Utku, Krishna, Tushar

论文摘要

稀疏性已成为压缩和加速深度神经网络（DNN）的有前途的方法之一。在不同类别的稀疏性中，由于对现代加速器的有效执行，结构化的稀疏性引起了人们的关注。特别是，n：m稀疏性很有吸引力，因为已经有硬件加速器架构可以利用某些形式的n：m结构化稀疏性来产生更高的计算效率。在这项工作中，我们专注于N：m稀疏性，并广泛研究和评估N：m稀疏性的各种培训食谱，以模型的准确性和计算成本（FLOPS）之间的权衡。在这项研究的基础上，我们提出了两种新的基于衰减的修剪方法，即“修剪面膜衰减”和“稀疏结构衰减”。我们的评估表明，这些提出的方法始终提供最新的（SOTA）模型精度，可与非结构化的稀疏性相当，在基于变压器的模型上用于翻译任务。使用新训练配方的稀疏模型准确性的提高是以总训练计算（FLOP）边际增加的成本。

Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DNNs). Among different categories of sparsity, structured sparsity has gained more attention due to its efficient execution on modern accelerators. Particularly, N:M sparsity is attractive because there are already hardware accelerator architectures that can leverage certain forms of N:M structured sparsity to yield higher compute-efficiency. In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs). Building upon this study, we propose two new decay-based pruning methods, namely "pruning mask decay" and "sparse structure decay". Our evaluations indicate that these proposed methods consistently deliver state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity, on a Transformer-based model for a translation task. The increase in the accuracy of the sparse model using the new training recipes comes at the cost of marginal increase in the total training compute (FLOPs).

下载PDF全文

下载文献需遵守相关版权规定

论文标题