论文标题
使用深度学习的平行率延伸优化量化
Parallelized Rate-Distortion Optimized Quantization Using Deep Learning
论文作者
论文摘要
费用延伸优化的量化(RDOQ)在最近的视频压缩标准(例如H.264/AVC,H.265/HEVC,HEVC,VP9和AV1)的编码性能中发挥了重要作用。该方案以相对较小的失真增加而显着减少比特率。通常,由于其顺序性质和经常获得熵编码成本的需求,在实时硬件编码器上实施的RDOQ算法非常昂贵。这项工作通过基于神经网络的方法解决了这一限制,该方法学会了在离线监督培训期间的权衡率和失真。由于这些网络仅基于可以在现有的神经网络硬件上执行的标准算术操作,因此不需要为专用的RDOQ电路保留其他芯片区域。我们训练两类的神经网络,一个完全跨局的网络和一个自动回归网络,并作为旨在完善廉价量化方案(例如标量量化(SQ))的定量后步骤进行评估。两种网络体系结构都设计为低计算开销。训练后,它们被整合到HM 16.20 HEVC的实现中,并在H.266/VVC SDR公共测试序列的子集上评估了视频编码性能。对HM 16.20中的RDOQ和SQ实现进行了比较。与HM SQ锚相比,我们的方法可在发光度上节省1.64%的BD率,并且平均达到了迭代HM RDOQ算法的45%。
Rate-Distortion Optimized Quantization (RDOQ) has played an important role in the coding performance of recent video compression standards such as H.264/AVC, H.265/HEVC, VP9 and AV1. This scheme yields significant reductions in bit-rate at the expense of relatively small increases in distortion. Typically, RDOQ algorithms are prohibitively expensive to implement on real-time hardware encoders due to their sequential nature and their need to frequently obtain entropy coding costs. This work addresses this limitation using a neural network-based approach, which learns to trade-off rate and distortion during offline supervised training. As these networks are based solely on standard arithmetic operations that can be executed on existing neural network hardware, no additional area-on-chip needs to be reserved for dedicated RDOQ circuitry. We train two classes of neural networks, a fully-convolutional network and an auto-regressive network, and evaluate each as a post-quantization step designed to refine cheap quantization schemes such as scalar quantization (SQ). Both network architectures are designed to have a low computational overhead. After training they are integrated into the HM 16.20 implementation of HEVC, and their video coding performance is evaluated on a subset of the H.266/VVC SDR common test sequences. Comparisons are made to RDOQ and SQ implementations in HM 16.20. Our method achieves 1.64% BD-rate savings on luminosity compared to the HM SQ anchor, and on average reaches 45% of the performance of the iterative HM RDOQ algorithm.