论文标题
大规模数据的体育视频分析
Sports Video Analysis on Large-Scale Data
论文作者
论文摘要
本文研究了体育视频上自动化机器描述的建模,最近取得了很多进展。尽管如此,最新的方法还没有捕捉人类专家如何分析体育场景。有几个主要原因:(1)使用的数据集是从非官方提供商那里收集的,这自然会在这些数据集和现实世界应用程序上训练的模型之间造成差距; (2)先前提出的方法需要广泛的注释工作(即,像素级别的玩家和球分段)在本地化有用的视觉特征上以产生可接受的结果; (3)很少有公共数据集可用。在本文中,我们提出了一个新颖的大型NBA数据集,用于体育视频分析(NSVA),重点是字幕,以应对上述挑战。我们还设计了一种统一的方法,将原始视频处理成一堆有意义的功能,并以最小的标签工作进行了处理,这表明使用变压器体系结构在此类功能上进行交叉建模会导致出色的性能。此外,我们通过解决了另外两个任务,即精细的运动动作识别和显着的球员身份,证明了NSVA的广泛应用。代码和数据集可在https://github.com/jackwu502/nsva上找到。
This paper investigates the modeling of automated machine description on sports video, which has seen much progress recently. Nevertheless, state-of-the-art approaches fall quite short of capturing how human experts analyze sports scenes. There are several major reasons: (1) The used dataset is collected from non-official providers, which naturally creates a gap between models trained on those datasets and real-world applications; (2) previously proposed methods require extensive annotation efforts (i.e., player and ball segmentation at pixel level) on localizing useful visual features to yield acceptable results; (3) very few public datasets are available. In this paper, we propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning, to address the above challenges. We also design a unified approach to process raw videos into a stack of meaningful features with minimum labelling efforts, showing that cross modeling on such features using a transformer architecture leads to strong performance. In addition, we demonstrate the broad application of NSVA by addressing two additional tasks, namely fine-grained sports action recognition and salient player identification. Code and dataset are available at https://github.com/jackwu502/NSVA.