显着驱动的多功能视频编码用于神经对象检测

论文标题

显着驱动的多功能视频编码用于神经对象检测

Saliency-Driven Versatile Video Coding for Neural Object Detection

论文作者

Fischer, Kristian, Fleckenstein, Felix, Herglotz, Christian, Kaup, André

论文摘要

在最近的过去，对人类的显着驱动图像和视频编码变得非常重要。在本文中，我们建议使用最新的视频编码标准Versatile视频编码（VVC）为计算机任务的视频编码提供这样的显着驱动的编码框架。为了在编码之前确定显着区域，我们使用可实时的对象检测网络，您只能与新的决策标准结合使用一次（Yolo）。为了测量机器的编码质量，将最新的对象分割网络掩码r-CNN应用于解码框架。从广泛的模拟中，我们发现，与持续质量的参考VVC相比，可以通过应用建议的显着驱动框架，在解码器侧具有相同的检测精度，可以保存多达29％的比特率。此外，我们将Yolo与其他更传统的显着性检测方法进行了比较。

Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once~(YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame. From extensive simulations we find that, compared to the reference VVC with a constant quality, up to 29 % of bitrate can be saved with the same detection accuracy at the decoder side by applying the proposed saliency-driven framework. Besides, we compare YOLO against other, more traditional saliency detection methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题