论文标题
部分可观测时空混沌系统的无模型预测
Leveraging progressive model and overfitting for efficient learned image compression
论文作者
论文摘要
在过去的十年中,深度学习在计算机视觉和图像/视频处理领域绝对是占主导地位。但是,对于图像和视频压缩,它基于离散余弦变换(DCT)和线性过滤器的传统技术落后于传统技术。近年来,基于自动编码器体系结构的顶部,学习的图像压缩(LIC)系统引起了极大的关注。然而,拟议的LIC系统仍然不如最先进的传统技术,例如,由于其压缩性能或解码的复杂性,多功能视频编码(VVC/H.266)标准。尽管声称在有限的比特率范围内胜过VVC/H.266,但一些提议的LIC系统需要40秒以上的时间来解码GPU系统上的2K图像。在本文中,我们引入了一个功能强大且灵活的LIC框架,该框架具有多尺度的渐进式(MSP)概率模型和潜在表示过度拟合(LOF)技术。有了不同的预定义轮廓,建议的框架可以在压缩效率和计算复杂性之间达到各种平衡点。实验表明,所提出的框架在较宽的比特率范围内的三个基准数据集上的VVC/H.266标准降低了2.5%,1.0%和1.3%的Bjontegaard三角洲比特率(BD率)。更重要的是,与许多其他LIC系统相比,解码的复杂性从O(N)降低到O(1),在解码2K图像时导致20倍的速度超过20倍。
Deep learning is overwhelmingly dominant in the field of computer vision and image/video processing for the last decade. However, for image and video compression, it lags behind the traditional techniques based on discrete cosine transform (DCT) and linear filters. Built on top of an autoencoder architecture, learned image compression (LIC) systems have drawn enormous attention in recent years. Nevertheless, the proposed LIC systems are still inferior to the state-of-the-art traditional techniques, for example, the Versatile Video Coding (VVC/H.266) standard, due to either their compression performance or decoding complexity. Although claimed to outperform the VVC/H.266 on a limited bit rate range, some proposed LIC systems take over 40 seconds to decode a 2K image on a GPU system. In this paper, we introduce a powerful and flexible LIC framework with multi-scale progressive (MSP) probability model and latent representation overfitting (LOF) technique. With different predefined profiles, the proposed framework can achieve various balance points between compression efficiency and computational complexity. Experiments show that the proposed framework achieves 2.5%, 1.0%, and 1.3% Bjontegaard delta bit rate (BD-rate) reduction over the VVC/H.266 standard on three benchmark datasets on a wide bit rate range. More importantly, the decoding complexity is reduced from O(n) to O(1) compared to many other LIC systems, resulting in over 20 times speedup when decoding 2K images.