视觉摘要，用于增强导航的视频片段

论文标题

视觉摘要，用于增强导航的视频片段

Visual Summarization of Lecture Video Segments for Enhanced Navigation

论文作者

Rahman, Mohammad Rajiur, Subhlok, Jaspal, Shah, Shishir

论文摘要

讲座视频是越来越重要的高等教育学习资源。但是，快速在演讲视频中找到感兴趣的内容的挑战是这种格式的重要限制。本文介绍了讲座视频片段的视觉汇总，以增强导航。基于内容的框架相似性，将演讲视频分为段。用户通过查看每个段的单一帧视觉和文本摘要来导航视频内容。本文提出了一种新的方法，可以通过计算从段中提取的图像和采用基于图的算法来确定大多数代表性图像的子集的相似性来生成讲座视频段的视觉摘要。这项研究的结果集成到现实的演讲视频管理门户网站，称为Videopoint。为了收集地面真相进行评估，进行了一项调查，其中多个用户手动为40个讲座视频片段提供了视觉摘要。用户还指出了是否没有为摘要选择任何图像，因为它们与其他选定的图像相似。用于识别摘要图像的基于图的算法可实现78％的精度和72％的F1量度，以常见的图像作为地面真相，而94％的精度和72％的F1 F1量化了所有用户选择的图像作为地面真相的结合。对于98％的算法选择的视觉摘要图像，至少有一个用户还选择了该图像的摘要或认为其类似于他们选择的另一个图像。超过65％的自动生成的摘要被用户评为4分，从贫穷到非常好。总体而言，结果表明，本文介绍的方法产生了质量高质量的视觉摘要，这些摘要实际上对教授视频导航有用。

Lecture videos are an increasingly important learning resource for higher education. However, the challenge of quickly finding the content of interest in a lecture video is an important limitation of this format. This paper introduces visual summarization of lecture video segments to enhance navigation. A lecture video is divided into segments based on the frame-to-frame similarity of content. The user navigates the lecture video content by viewing a single frame visual and textual summary of each segment. The paper presents a novel methodology to generate the visual summary of a lecture video segment by computing similarities between images extracted from the segment and employing a graph-based algorithm to identify the subset of most representative images. The results from this research are integrated into a real-world lecture video management portal called Videopoints. To collect ground truth for evaluation, a survey was conducted where multiple users manually provided visual summaries for 40 lecture video segments. The users also stated whether any images were not selected for the summary because they were similar to other selected images. The graph based algorithm for identifying summary images achieves 78% precision and 72% F1-measure with frequently selected images as the ground truth, and 94% precision and 72% F1-measure with the union of all user selected images as the ground truth. For 98% of algorithm selected visual summary images, at least one user also selected that image for their summary or considered it similar to another image they selected. Over 65% of automatically generated summaries were rated as good or very good by the users on a 4-point scale from poor to very good. Overall, the results establish that the methodology introduced in this paper produces good quality visual summaries that are practically useful for lecture video navigation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题