卷积神经网络压缩低等级和稀疏张量分解

论文标题

卷积神经网络压缩低等级和稀疏张量分解

Convolutional neural networks compression with low rank and sparse tensor decompositions

论文作者

Kaloshin, Pavel

论文摘要

卷积神经网络在各种计算机视觉任务中表现出了出色的结果。但是，神经网络体系结构设计通常面临模型性能与计算/记忆复杂性之间的权衡。对于某些现实世界中的应用程序，开发模型至关重要，该模型可以快速且轻巧，可以在边缘系统和移动设备上运行。但是，许多表现出良好性能的现代体系结构无法满足推理时间和存储限制要求。因此，出现了神经网络压缩的问题，以获得较小且更快的模型，该模型与初始模型相当。在这项工作中，我们考虑了一种基于张量分解的神经网络压缩方法。也就是说，我们建议用张量近似卷积层的重量，该量子可以表示为低级别和稀疏组件的总和。这种近似的动机基于以下假设：低级别和稀疏项允许消除两种不同类型的冗余，从而产生更好的压缩率。已经开发了提出的方法的有效的CPU实施。当压缩RESNET50架构以进行图像分类任务时，我们的算法已显示高达3.5倍的CPU层加速度和11倍层大小。

Convolutional neural networks show outstanding results in a variety of computer vision tasks. However, a neural network architecture design usually faces a trade-off between model performance and computational/memory complexity. For some real-world applications, it is crucial to develop models, which can be fast and light enough to run on edge systems and mobile devices. However, many modern architectures that demonstrate good performance don't satisfy inference time and storage limitation requirements. Thus, arises a problem of neural network compression to obtain a smaller and faster model, which is on par with the initial one. In this work, we consider a neural network compression method based on tensor decompositions. Namely, we propose to approximate the convolutional layer weight with a tensor, which can be represented as a sum of low-rank and sparse components. The motivation for such approximation is based on the assumption that low-rank and sparse terms allow eliminating two different types of redundancy and thus yield a better compression rate. An efficient CPU implementation for the proposed method has been developed. Our algorithm has demonstrated up to 3.5x CPU layer speedup and 11x layer size reduction when compressing Resnet50 architecture for the image classification task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题