论文标题
BSL-1K:使用介绍线索扩展共同发明的手语识别
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
论文作者
论文摘要
细粒度的手势和动作分类以及机器翻译方面的最新进展表明,自动化手语识别的可能性成为现实。朝着这一目标取得进展的关键绊脚石是缺乏适当的培训数据,这是由于标志注释的高复杂性和有限的合格注释者的供应有限。在这项工作中,我们在连续视频中引入了一种新的可扩展数据收集方法,以供标志识别。我们利用弱分配的字幕进行广播录像,以及一种关键字发现方法,以在1,000小时的视频中自动定位签名,从而获得1,000个迹象的词汇。我们做出以下贡献:(1)我们展示了如何使用签名者的介绍线索来从视频数据中获取高质量的注释 - 结果是BSL-1K数据集,这是英国手语(BSL)迹象的迹象,表明了前所未有的规模; (2)我们证明,我们可以使用BSL-1K来训练BSL中共同发明标志的强标志识别模型,并且这些模型还为其他符号语言和基准构成了出色的预告片 - 我们在MSASL和WLASL基准测试的情况下都超过了最新技术。最后,(3)我们为标志识别和标志斑点的任务提出了新的大规模评估集,并提供基准,我们希望这些基线能够激发该领域的研究。
Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 hours of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data - the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks - we exceed the state of the art on both the MSASL and WLASL benchmarks. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.