使用Tucker分解来压缩CNN内核进行视频：朝向轻质CNN应用

论文标题

使用Tucker分解来压缩CNN内核进行视频：朝向轻质CNN应用

Compressing CNN Kernels for Videos Using Tucker Decompositions: Towards Lightweight CNN Applications

论文作者

Rasmussen, Tobias Engelhardt, Clemmensen, Line H, Baum, Andreas

论文摘要

卷积神经网络（CNN）是视觉计算领域的最新技术。但是，CNN的一个主要问题是对大型输入进行卷积所需的大量浮点操作（FLOP）。在考虑将CNN应用于视频数据时，由于额外的时间维度，卷积过滤器变得更加复杂。当将各自的应用程序部署在移动设备上，例如智能手机，平板电脑，微型控制器或类似的移动设备上时，这会导致问题。 Kim等。（2016年）提出使用塔克分类来压缩预训练网络的卷积内核，以减少网络的复杂性，即拖失板的数量。在本文中，我们将上述方法推广到视频（和其他3D信号）中，并在Thetis数据集的修改版本上评估了所提出的方法，其中包含执行网球镜头的个人的视频。我们表明，压缩网络达到可比的精度，同时表明记忆压缩为51倍。但是，实际的计算速度（因子1.4）不符合我们的理论上导出的期望（因子6）。

Convolutional Neural Networks (CNN) are the state-of-the-art in the field of visual computing. However, a major problem with CNNs is the large number of floating point operations (FLOPs) required to perform convolutions for large inputs. When considering the application of CNNs to video data, convolutional filters become even more complex due to the extra temporal dimension. This leads to problems when respective applications are to be deployed on mobile devices, such as smart phones, tablets, micro-controllers or similar, indicating less computational power. Kim et al. (2016) proposed using a Tucker-decomposition to compress the convolutional kernel of a pre-trained network for images in order to reduce the complexity of the network, i.e. the number of FLOPs. In this paper, we generalize the aforementioned method for application to videos (and other 3D signals) and evaluate the proposed method on a modified version of the THETIS data set, which contains videos of individuals performing tennis shots. We show that the compressed network reaches comparable accuracy, while indicating a memory compression by a factor of 51. However, the actual computational speed-up (factor 1.4) does not meet our theoretically derived expectation (factor 6).

下载PDF全文

下载文献需遵守相关版权规定

论文标题