论文标题
嵌入式平台上优化的多功能视频编码软件解码器的性能分析
Performance Analysis of Optimized Versatile Video Coding Software Decoders on Embedded Platforms
论文作者
论文摘要
近年来,全球对高分辨率视频的需求以及新的多媒体应用程序的出现使需要新的视频编码标准。因此,与其前身高效视频编码(HEVC)相比,在2020年7月发布了多功能视频编码(VVC)标准,可节省多达50%的比特率,以节省相同的视频质量。但是,这种比特率的节省是以高计算复杂性为代价的,尤其是用于实时应用程序和资源构成嵌入式设备。本文介绍了两个优化的VVC软件解码器,分别为OpenVVC和Versatile Video解码器(VVDEC),该解码器是为低资源平台而设计的。他们使用单个指令使用多个数据(SIMD)指令和功能级别的并行性利用框架,瓷砖和基于切片的并行性来利用诸如数据级并行性。此外,在针对两个不同的资源构成嵌入式设备时,介绍了两个解码器之间的运行时间,能量和内存消耗的比较。结果表明,两个解码器都使用8个核心使用8个核心和高清(HD)实时解码到第二个平台上的第一个平台上的全高清晰度(FHD)分辨率实现实时解码,仅使用4个核心使用4个核心进行平均消耗能量的可比结果:在8个核心和4个内核和4个核心嵌入式平台上,大约26 J和15 J。关于内存使用情况,OpenVV与VVDEC相比,在运行期间消耗的最大记忆较少的结果显示更好的结果。
In recent years, the global demand for high-resolution videos and the emergence of new multimedia applications have created the need for a new video coding standard. Hence, in July 2020 the Versatile Video Coding (VVC) standard was released providing up to 50% bit-rate saving for the same video quality compared to its predecessor High Efficiency Video Coding (HEVC). However, this bit-rate saving comes at the cost of a high computational complexity, particularly for live applications and on resource-constraint embedded devices. This paper presents two optimized VVC software decoders, named OpenVVC and Versatile Video deCoder (VVdeC), designed for low resources platforms. They exploit optimization techniques such as data level parallelism using Single Instruction Multiple Data (SIMD) instructions and functional level parallelism using frame, tile and slice-based parallelisms. Furthermore, a comparison in terms of decoding run time, energy and memory consumption between the two decoders is presented while targeting two different resource-constraint embedded devices. The results showed that both decoders achieve real-time decoding of Full High definition (FHD) resolution over the first platform using 8 cores and High-definition (HD) real-time decoding for the second platform using only 4 cores with comparable results in terms of average consumed energy: around 26 J and 15 J for the 8 cores and 4 cores embedded platforms, respectively. Regarding the memory usage, OpenVVC showed better results with less average maximum memory consumed during run time compared to VVdeC.