观看，阅读和查找：学会从多个主管那里发现标志

论文标题

观看，阅读和查找：学会从多个主管那里发现标志

Watch, read and lookup: learning to spot signs from multiple supervisors

论文作者

Momeni, Liliane, Varol, Gül, Albanie, Samuel, Afouras, Triantafyllos, Zisserman, Andrew

论文摘要

这项工作的重点是标志发现 - 鉴于一个孤立的标志的视频，我们的任务是确定在连续的，共同发达的手语视频中是否签署了它的位置。为了实现此标志斑点任务，我们使用多种可用的监督训练模型：（1）观看现有的稀疏标签镜头；（2）读取相关的字幕（随时可用的签名内容翻译），这些字幕提供了其他弱点；（3）在视觉手语词典中查找单词（对于没有共共同标记的示例），以实现新颖的标志斑点。这三个任务使用噪声对比估计和多个实例学习的原理将这三个任务集成到统一的学习框架中。我们验证方法在低射击标志斑点基准测试中的有效性。此外，我们还贡献了一个可读的英国手语（BSL）字典数据集的孤立符号BSLDICT，以促进研究此任务。数据集，模型和代码可在我们的项目页面上找到。

The focus of this work is sign spotting - given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video. To achieve this sign spotting task, we train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles (readily available translations of the signed content) which provide additional weak-supervision; (3) looking up words (for which no co-articulated labelled examples are available) in visual sign language dictionaries to enable novel sign spotting. These three tasks are integrated into a unified learning framework using the principles of Noise Contrastive Estimation and Multiple Instance Learning. We validate the effectiveness of our approach on low-shot sign spotting benchmarks. In addition, we contribute a machine-readable British Sign Language (BSL) dictionary dataset of isolated signs, BSLDict, to facilitate study of this task. The dataset, models and code are available at our project page.

下载PDF全文

下载文献需遵守相关版权规定

论文标题