论文标题
主题知情的视听对应学习
Themes Informed Audio-visual Correspondence Learning
论文作者
论文摘要
最近,Snapchat和YouTube短期视频等短期用户生成的视频(UGV)的应用,最近引起了许多多模式机器学习任务。其中,从视频中学习音频和视觉信息之间的对应关系是一个具有挑战性的。视听通讯(AVC)学习的大多数工作仅研究了受约束的视频或简单设置,这可能不符合UGV的应用。在本文中,我们为AVC提出了新的原则,并引入了一个新框架,以设置视频主题以促进AVC学习。我们还发布了Kwai-Ad-Audvis语料库,其中包含85432个简短的广告视频(约913小时)。我们评估了我们在该语料库的拟议方法,并且能够优于基线的绝对差异23.15%。
The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks. Among them, learning the correspondence between audio and visual information from videos is a challenging one. Most previous work of the audio-visual correspondence(AVC) learning only investigated constrained videos or simple settings, which may not fit the application of UGV. In this paper, we proposed new principles for AVC and introduced a new framework to set sight of videos' themes to facilitate AVC learning. We also released the KWAI-AD-AudVis corpus which contained 85432 short advertisement videos (around 913 hours) made by users. We evaluated our proposed approach on this corpus, and it was able to outperform the baseline by 23.15% absolute difference.