有效的位宽搜索实用混合精度神经网络

论文标题

有效的位宽搜索实用混合精度神经网络

Efficient Bitwidth Search for Practical Mixed Precision Neural Network

论文作者

Li, Yuhang, Wang, Wei, Bai, Haoli, Gong, Ruihao, Dong, Xin, Yu, Fengwei

论文摘要

网络量化已迅速成为压缩和加速深度神经网络的最广泛使用的方法之一。最近的努力建议量化不同层的权重和激活，以不同的精度提高整体性能。但是，找到有效的权重和激活的最佳位宽（即精度）是一个挑战。同时，目前尚不清楚如何在通用硬件平台上有效地进行权重和不同精度的激活进行卷积。为了解决这两个问题，在本文中，我们首先提出了有效的位搜索（EBS）算法，该算法可以重用不同量化位的元重量，因此可以直接优化每个候选精度的强度，可以直接优化无效副本的目标，从而降低了记忆和计算成本。其次，我们提出了一种二进制分解算法，该算法将不同精度的权重和激活转换为二元矩阵，以使混合精度卷积有效且实用。 CIFAR10和Imagenet数据集的实验结果证明了我们的混合精度QNN优于手工制作的均匀位于位的位置对应物和其他混合精度技术。

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to improve the overall performance. However, it is challenging to find the optimal bitwidth (i.e., precision) for weights and activations of each layer efficiently. Meanwhile, it is yet unclear how to perform convolution for weights and activations of different precision efficiently on generic hardware platforms. To resolve these two issues, in this paper, we first propose an Efficient Bitwidth Search (EBS) algorithm, which reuses the meta weights for different quantization bitwidth and thus the strength for each candidate precision can be optimized directly w.r.t the objective without superfluous copies, reducing both the memory and computational cost significantly. Second, we propose a binary decomposition algorithm that converts weights and activations of different precision into binary matrices to make the mixed precision convolution efficient and practical. Experiment results on CIFAR10 and ImageNet datasets demonstrate our mixed precision QNN outperforms the handcrafted uniform bitwidth counterparts and other mixed precision techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题