UWC：针对快速网络压缩的单位校准

论文标题

UWC：针对快速网络压缩的单位校准

UWC: Unit-wise Calibration Towards Rapid Network Compression

论文作者

Lin, Chen, Li, Zheyang, Peng, Bo, Hu, Haoji, Tan, Wenming, Ren, Ye, Pu, Shiliang

论文摘要

本文介绍了训练后量化〜（PTQ）方法，该方法可实现高效的卷积神经网络〜（CNN）量化，并具有高性能。以前的PTQ方法通常通过执行逐层参数校准来减少压缩误差。但是，由于极端压缩参数的表示能力较低（例如，位宽度少于4），因此很难消除所有层的错误。这项工作通过提出单位特征重建算法来解决此问题，该算法是基于对单位误差的二阶泰勒系列扩展的观察。它表明利用相邻层参数之间的相互作用可以更好地补偿层错误。在本文中，我们将几个相邻的层定义为基本单位，并提出一种单位训练后算法，该算法可以最大程度地减少量化误差。当将FP32模型量化为INT4和INT3时，此方法在ImageNet和Coco上实现了几乎原始的精度。

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing layer-by-layer parameters calibration. However, with lower representational ability of extremely compressed parameters (e.g., the bit-width goes less than 4), it is hard to eliminate all the layer-wise errors. This work addresses this issue via proposing a unit-wise feature reconstruction algorithm based on an observation of second order Taylor series expansion of the unit-wise error. It indicates that leveraging the interaction between adjacent layers' parameters could compensate layer-wise errors better. In this paper, we define several adjacent layers as a Basic-Unit, and present a unit-wise post-training algorithm which can minimize quantization error. This method achieves near-original accuracy on ImageNet and COCO when quantizing FP32 models to INT4 and INT3.

下载PDF全文

下载文献需遵守相关版权规定

论文标题