论文标题
摘要足球视频的多个阶段深度建筑
A Multi-stage deep architecture for summary generation of soccer videos
论文作者
论文摘要
视频内容存在于科学和商业数量的越来越多的领域中。体育,尤其是足球,是由于游戏的广泛欢迎和新市场的出现,在视频分析领域投入最多的行业之一。足球上的先前最新方法匹配视频摘要依赖于手工启发式方法来产生概括性不佳的摘要,但是这些作品尚未证明多种方式有助于检测游戏的最佳动作。另一方面,具有较高概括潜力的机器学习模型已进入通用视频的摘要领域,提供了几种深度学习方法。但是,其中大多数利用不适用于运动全匹配视频的内容特异性。尽管视频内容多年来一直是足球知识提取自动化知识的主要来源,但记录现场发生的所有事件的数据在体育分析中已变得非常重要,因为此事件数据提供了更丰富的上下文信息,并且需要更少的处理。我们提出了一种生成足球比赛摘要的方法,利用了音频和事件元数据。结果表明,我们的方法可以检测匹配的动作,确定其中的哪个应属于摘要,然后提出多个候选摘要,这些摘要足够相似,但具有相关的可变性,可以为最终编辑器提供不同的选项。此外,我们显示了工作的概括能力,因为它可以从不同的广播公司,不同的竞争,在不同条件下获得的数据集之间传输知识,并且对应于不同长度的摘要
Video content is present in an ever-increasing number of fields, both scientific and commercial. Sports, particularly soccer, is one of the industries that has invested the most in the field of video analytics, due to the massive popularity of the game and the emergence of new markets. Previous state-of-the-art methods on soccer matches video summarization rely on handcrafted heuristics to generate summaries which are poorly generalizable, but these works have yet proven that multiple modalities help detect the best actions of the game. On the other hand, machine learning models with higher generalization potential have entered the field of summarization of general-purpose videos, offering several deep learning approaches. However, most of them exploit content specificities that are not appropriate for sport whole-match videos. Although video content has been for many years the main source for automatizing knowledge extraction in soccer, the data that records all the events happening on the field has become lately very important in sports analytics, since this event data provides richer context information and requires less processing. We propose a method to generate the summary of a soccer match exploiting both the audio and the event metadata. The results show that our method can detect the actions of the match, identify which of these actions should belong to the summary and then propose multiple candidate summaries which are similar enough but with relevant variability to provide different options to the final editor. Furthermore, we show the generalization capability of our work since it can transfer knowledge between datasets from different broadcasting companies, different competitions, acquired in different conditions, and corresponding to summaries of different lengths