大规模签名：学会为大规模的照片 - 现实手语制作共同发出标志

论文标题

大规模签名：学会为大规模的照片 - 现实手语制作共同发出标志

Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

论文作者

Saunders, Ben, Camgoz, Necati Cihan, Bowden, Richard

论文摘要

符号语言是视觉语言，词汇量与其口语一样丰富。但是，当前基于深度学习的手语产生（SLP）模型从约束词汇中产生了不足的骨骼姿势序列，这限制了适用性。为了被聋人可以理解和接受，自动SLP系统必须能够为大型话语领域生成共同的照片现实签名序列。在这项工作中，我们通过学习在字典标志之间共同发音来解决大规模SLP，这是一种能够在扩展到不受约束的话语领域的同时产生平滑签名的方法。为了学习符号共同发电，我们提出了一个新型的框架选择网络（FS-NET），该网络将插值字典符号的时间比对到连续签名序列。此外，我们提出了Signgan，这是一种姿势条件的人类合成模型，该模型直接从骨架姿势中产生照片真实的手语视频。我们提出了一种新型基于关键点的损耗函数，可提高合成手图像的质量。我们在大规模Meinedgs（MDGS）语料库上评估我们的SLP模型，进行了广泛的用户评估，以显示我们的FS-NET方法改善了插值字典符号的共同发电。此外，我们表明，Signgan明显胜过所有基线方法，用于定量指标，人类感知研究和天然聋人签名者的理解。

Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts. However, current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences from constrained vocabularies and this limits applicability. To be understandable and accepted by the deaf, an automatic SLP system must be able to generate co-articulated photo-realistic signing sequences for large domains of discourse. In this work, we tackle large-scale SLP by learning to co-articulate between dictionary signs, a method capable of producing smooth signing while scaling to unconstrained domains of discourse. To learn sign co-articulation, we propose a novel Frame Selection Network (FS-Net) that improves the temporal alignment of interpolated dictionary signs to continuous signing sequences. Additionally, we propose SignGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos direct from skeleton pose. We propose a novel keypoint-based loss function which improves the quality of synthesized hand images. We evaluate our SLP model on the large-scale meineDGS (mDGS) corpus, conducting extensive user evaluation showing our FS-Net approach improves co-articulation of interpolated dictionary signs. Additionally, we show that SignGAN significantly outperforms all baseline methods for quantitative metrics, human perceptual studies and native deaf signer comprehension.

下载PDF全文

下载文献需遵守相关版权规定

论文标题