论文标题
内方型:基于变压器的熵模型,用于学习的图像压缩
Entroformer: A Transformer-based Entropy Model for Learned Image Compression
论文作者
论文摘要
熵模型中有损耗的深层图像压缩中的一个关键成分是,该模型可以预测编码和解码模块中量化潜在表示的概率分布。先前的作品建立在卷积神经网络上的熵模型,这些卷积神经网络无法捕获全球依赖性。在这项工作中,我们提出了一种新型的基于变压器的熵模型,称为内侧,以有效有效地捕获概率分布估计中的长期依赖性。与图像分类中的视觉变压器不同,内方对图像压缩的优化高度优化,包括顶部的自我注意力和钻石相对位置编码。同时,我们通过并行双向上下文模型进一步扩展了这种体系结构,以加快解码过程。该实验表明,该端子在及时效率的同时,在图像压缩方面达到了最先进的性能。
One critical component in lossy deep image compression is the entropy model, which predicts the probability distribution of the quantized latent representation in the encoding and decoding modules. Previous works build entropy models upon convolutional neural networks which are inefficient in capturing global dependencies. In this work, we propose a novel transformer-based entropy model, termed Entroformer, to capture long-range dependencies in probability distribution estimation effectively and efficiently. Different from vision transformers in image classification, the Entroformer is highly optimized for image compression, including a top-k self-attention and a diamond relative position encoding. Meanwhile, we further expand this architecture with a parallel bidirectional context model to speed up the decoding process. The experiments show that the Entroformer achieves state-of-the-art performance on image compression while being time-efficient.