BITAT：具有任务依赖性聚合转换的神经网络二进制

论文标题

BITAT：具有任务依赖性聚合转换的神经网络二进制

BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation

论文作者

Park, Geon, Yoon, Jaehong, Zhang, Haiyang, Zhang, Xing, Hwang, Sung Ju, Eldar, Yonina C.

论文摘要

神经网络量化旨在将特定神经网络的高精度权重和激活转变为低精度权重/激活，以减少存储器使用和计算，同时保留原始模型的性能。然而，紧凑设计的主链体系结构（例如Mobilenets）经常用于边缘设备部署的极端量化（1位重量/1位激活）会导致严重的性能变性。本文提出了一种新颖的量化感知训练（QAT）方法，即使通过重点关注各层之间和连续层之间的权重之间的重量间依赖性，也可以通过极端量化有效地减轻性能退化。为了最大程度地减少每个重量对其他重量的影响，我们通过训练一个依赖输入依赖性的相关矩阵和重要性向量对每一层的重量进行正交转换，从而使每个权重都与其他权重分开。然后，我们根据权重量化其重要性，以最大程度地减少原始权重/激活中信息的损失。我们进一步执行从底层到顶部的渐进层量化，因此每一层的量化都反映了先前层的权重和激活的量化分布。我们验证了我们的方法对各种基准数据集的有效性，可针对强神经量化基线，这表明它可以减轻ImageNet上的性能变性，并成功地保留了CIFAR-100上具有紧凑型骨干网络的完整精确模型性能。

Neural network quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation, while preserving the performance of the original model. However, extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures (e.g., MobileNets) often used for edge-device deployments results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration even with extreme quantization by focusing on the inter-weight dependencies, between the weights within each layer and across consecutive layers. To minimize the quantization impact of each weight on others, we perform an orthonormal transformation of the weights at each layer by training an input-dependent correlation matrix and importance vector, such that each weight is disentangled from the others. Then, we quantize the weights based on their importance to minimize the loss of the information from the original weights/activations. We further perform progressive layer-wise quantization from the bottom layer to the top, so that quantization at each layer reflects the quantized distributions of weights and activations at previous layers. We validate the effectiveness of our method on various benchmark datasets against strong neural quantization baselines, demonstrating that it alleviates the performance degeneration on ImageNet and successfully preserves the full-precision model performance on CIFAR-100 with compact backbone networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题