快速VQA：片段采样的有效端到端视频质量评估

论文标题

快速VQA：片段采样的有效端到端视频质量评估

FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling

论文作者

Wu, Haoning, Chen, Chaofeng, Hou, Jingwen, Liao, Liang, Wang, Annan, Sun, Wenxiu, Yan, Qiong, Lin, Weisi

论文摘要

当前的深度视频质量评估（VQA）方法通常在评估高分辨率视频时具有高计算成本。这使他们无法通过端到端培训学习更好的视频质量相关表示。现有方法通常考虑幼稚的采样以降低计算成本，例如调整大小和裁剪。但是，它们显然在视频中损坏了与质量相关的信息，因此并不是学习VQA的良好表示形式的最佳选择。因此，渴望为VQA设计一种新的质量保留抽样方案。在本文中，我们提出了网格迷你斑点采样（GMS），该采样允许通过在原始分辨率下采样贴片来考虑局部质量，并通过以统一网格采样的迷你绘制来涵盖全球质量。这些迷你斑块是剪接和对齐的，称为片段。我们进一步构建了专门设计的碎片注意网络（粉丝），以适应碎片作为输入。由片段和粉丝组成，VQA（快速VQA）提出的片段样品变压器可实现有效的端到端深VQA，并学习有效的与视频质量相关的表示。它将最新的准确性提高了约10％，同时减少了1080p高分辨率视频的99.5％的失败。新学习的与视频质量相关的表示形式也可以转移到较小的VQA数据集中，从而在这些情况下提高性能。广泛的实验表明，Fast-VQA在各种分辨率的输入方面具有良好的性能，同时保持高效率。我们在https://github.com/timothyhtimothy/fast-vqa上发布代码。

Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos. This cost hinders them from learning better video-quality-related representations via end-to-end training. Existing approaches typically consider naive sampling to reduce the computational cost, such as resizing and cropping. However, they obviously corrupt quality-related information in videos and are thus not optimal for learning good representations for VQA. Therefore, there is an eager need to design a new quality-retained sampling scheme for VQA. In this paper, we propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution and covers global quality with contextual relations via mini-patches sampled in uniform grids. These mini-patches are spliced and aligned temporally, named as fragments. We further build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs. Consisting of fragments and FANet, the proposed FrAgment Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and learns effective video-quality-related representations. It improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos. The newly learned video-quality-related representations can also be transferred into smaller VQA datasets, boosting performance in these scenarios. Extensive experiments show that FAST-VQA has good performance on inputs of various resolutions while retaining high efficiency. We publish our code at https://github.com/timothyhtimothy/FAST-VQA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题