视频问题回答：数据集，算法和挑战

论文标题

视频问题回答：数据集，算法和挑战

Video Question Answering: Datasets, Algorithms and Challenges

论文作者

Zhong, Yaoyao, Xiao, Junbin, Ji, Wei, Li, Yicong, Deng, Weihong, Chua, Tat-Seng

论文摘要

视频问题回答（VideoQA）旨在根据给定的视频回答自然语言问题。随着联合视觉和语言理解的最新研究趋势，它引起了人们的关注。但是，与ImageQA相比，VideoQA在很大程度上却没有被逐渐流动，并且进展缓慢。尽管在不同的VideoQA数据集上不断提出了不同的算法并显示了成功，但我们发现缺乏有意义的调查来对它们进行分类，这严重阻碍了其进步。因此，本文为VideoQA提供了明确的分类学和全面的分析，重点是数据集，算法和独特的挑战。然后，我们指出了研究质量质量质量质量质量质量质量质量质量质量质量质量质量值的研究趋势，最后，我们总结了一些有希望的未来探索方向。

Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos. It has earned increasing attention with recent research trends in joint vision and language understanding. Yet, compared with ImageQA, VideoQA is largely underexplored and progresses slowly. Although different algorithms have continually been proposed and shown success on different VideoQA datasets, we find that there lacks a meaningful survey to categorize them, which seriously impedes its advancements. This paper thus provides a clear taxonomy and comprehensive analyses to VideoQA, focusing on the datasets, algorithms, and unique challenges. We then point out the research trend of studying beyond factoid QA to inference QA towards the cognition of video contents, Finally, we conclude some promising directions for future exploration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题