论文标题

UWC:针对快速网络压缩的单位校准

UWC: Unit-wise Calibration Towards Rapid Network Compression

论文作者

Lin, Chen, Li, Zheyang, Peng, Bo, Hu, Haoji, Tan, Wenming, Ren, Ye, Pu, Shiliang

论文摘要

本文介绍了训练后量化〜(PTQ)方法,该方法可实现高效的卷积神经网络〜(CNN)量化,并具有高性能。以前的PTQ方法通常通过执行逐层参数校准来减少压缩误差。但是,由于极端压缩参数的表示能力较低(例如,位宽度少于4),因此很难消除所有层的错误。这项工作通过提出单位特征重建算法来解决此问题,该算法是基于对单位误差的二阶泰勒系列扩展的观察。它表明利用相邻层参数之间的相互作用可以更好地补偿层错误。在本文中,我们将几个相邻的层定义为基本单位,并提出一种单位训练后算法,该算法可以最大程度地减少量化误差。当将FP32模型量化为INT4和INT3时,此方法在ImageNet和Coco上实现了几乎原始的精度。

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing layer-by-layer parameters calibration. However, with lower representational ability of extremely compressed parameters (e.g., the bit-width goes less than 4), it is hard to eliminate all the layer-wise errors. This work addresses this issue via proposing a unit-wise feature reconstruction algorithm based on an observation of second order Taylor series expansion of the unit-wise error. It indicates that leveraging the interaction between adjacent layers' parameters could compensate layer-wise errors better. In this paper, we define several adjacent layers as a Basic-Unit, and present a unit-wise post-training algorithm which can minimize quantization error. This method achieves near-original accuracy on ImageNet and COCO when quantizing FP32 models to INT4 and INT3.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源