论文标题

GEB+:通用事件边界字幕,接地和检索的基准

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

论文作者

Wang, Yuxuan, Gao, Difei, Yu, Licheng, Lei, Stan Weixian, Feiszli, Matt, Shou, Mike Zheng

论文摘要

认知科学表明,人类会以由占主导地位的状态变化所隔离的事件来感知视频。状态变化触发了新事件,并且是大量冗余信息中最有用的事件之一。但是,先前的研究重点是对细分市场的整体理解,而无需评估内部细粒度的变化。在本文中,我们介绍了一个名为Kinetic-GEB+的新数据集。该数据集由与标题相关的170K边界组成,这些字幕描述了12K视频中通用事件中的状态变化。在这个新数据集中,我们提出了三个任务,支持通过状态变化开发对视频的更细粒度,健壮和类似人类的理解。我们在数据集中评估了许多代表性的基线,在该基线中,我们还设计了一种新的TPD(基于时间的成对差异)建模方法,以进行视觉差异并实现重大的性能改进。此外,结果表明,对于当前方法的利用,视觉差异的表示以及状态变化的准确定位仍然存在着巨大的挑战。进一步的分析表明,我们的数据集可以推动开发更强大的方法来了解状态变化,从而提高视频级别的理解。包括视频和边界的数据集可在https://yuxuan-w.github.io/geb-plus/上找到

Cognitive science has shown that humans perceive videos in terms of events separated by the state changes of dominant subjects. State changes trigger new events and are one of the most useful among the large amount of redundant information perceived. However, previous research focuses on the overall understanding of segments without evaluating the fine-grained status changes inside. In this paper, we introduce a new dataset called Kinetic-GEB+. The dataset consists of over 170k boundaries associated with captions describing status changes in the generic events in 12K videos. Upon this new dataset, we propose three tasks supporting the development of a more fine-grained, robust, and human-like understanding of videos through status changes. We evaluate many representative baselines in our dataset, where we also design a new TPD (Temporal-based Pairwise Difference) Modeling method for visual difference and achieve significant performance improvements. Besides, the results show there are still formidable challenges for current methods in the utilization of different granularities, representation of visual difference, and the accurate localization of status changes. Further analysis shows that our dataset can drive developing more powerful methods to understand status changes and thus improve video level comprehension. The dataset including both videos and boundaries is available at https://yuxuan-w.github.io/GEB-plus/

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源