论文标题
学识渊博的图像压缩编解码器的流媒体能力高性能结构
Streaming-capable High-performance Architecture of Learned Image Compression Codecs
论文作者
论文摘要
学习的图像压缩允许达到最新的准确性和压缩比,但是它们相对较慢的运行时性能限制了其使用情况。尽管以前的优化学习图像编解码器的尝试更多地集中在神经模型和熵编码上,但我们提出了一种改善各种学习图像压缩模型的运行时性能的替代方法。我们介绍了多线程管道和优化的内存模型,以完全利用计算资源来启用GPU和CPU工作负载异步执行。仅我们的架构就已经产生了出色的性能,而没有任何改变神经模型本身的变化。我们还证明,将架构与以前的调整结合到神经模型可以进一步提高运行时性能。我们表明,与基线相比,我们的实现在吞吐量和延迟中表现出色,并通过创建实时视频流编码器示例应用程序来证明实现我们的实现,并在嵌入式设备上运行编码器。
Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the neural model and entropy coding, we present an alternative method to improving the runtime performance of various learned image compression models. We introduce multi-threaded pipelining and an optimized memory model to enable GPU and CPU workloads asynchronous execution, fully taking advantage of computational resources. Our architecture alone already produces excellent performance without any change to the neural model itself. We also demonstrate that combining our architecture with previous tweaks to the neural models can further improve runtime performance. We show that our implementations excel in throughput and latency compared to the baseline and demonstrate the performance of our implementations by creating a real-time video streaming encoder-decoder sample application, with the encoder running on an embedded device.