部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale

论文作者

Agarwal, Aditya, Sen, Bipasha, Mukhopadhyay, Rudrabha, Namboodiri, Vinay, Jawahar, C. V

论文摘要

许多具有某种形式听力损失的人认为唇读是他们日常交流的主要模式。但是，寻找学习或提高口头阅读技能的资源可能具有挑战性。由于对与同龄人和言语治疗师的直接相互作用的限制，在Covid19大流行中进一步加剧了这一点。如今，Coursera和Udemy等在线MOOC平台已成为许多类型的技能开发的最有效培训形式。但是，在线口头启动资源稀缺，因为创建这样的资源是一个广泛的过程，需要数月的手动努力来记录雇用的演员。由于手动管道，此类平台也受到词汇，支持语言，口音和扬声器的限制，并且使用成本很高。在这项工作中，我们研究了用合成生成的视频代替真实的人说话视频的可能性。合成数据可以轻松地包含更大的词汇，口音的变化，甚至本地语言和许多说话者。我们提出了一条端到端的自动化管道，以使用最先进的说话主视频生成器网络，文本到语音模型和计算机视觉技术开发这样的平台。然后，我们使用仔细考虑的口头练习进行了广泛的人类评估，以验证我们设计平台的质量针对现有的口头读取平台。我们的研究具体地指出了我们方法在开发大规模唇部MOOC平台上的潜力，该平台可能会影响数百万听力损失的人。

Many people with some form of hearing loss consider lipreading as their primary mode of day-to-day communication. However, finding resources to learn or improve one's lipreading skills can be challenging. This is further exacerbated in the COVID19 pandemic due to restrictions on direct interactions with peers and speech therapists. Today, online MOOCs platforms like Coursera and Udemy have become the most effective form of training for many types of skill development. However, online lipreading resources are scarce as creating such resources is an extensive process needing months of manual effort to record hired actors. Because of the manual pipeline, such platforms are also limited in vocabulary, supported languages, accents, and speakers and have a high usage cost. In this work, we investigate the possibility of replacing real human talking videos with synthetically generated videos. Synthetic data can easily incorporate larger vocabularies, variations in accent, and even local languages and many speakers. We propose an end-to-end automated pipeline to develop such a platform using state-of-the-art talking head video generator networks, text-to-speech models, and computer vision techniques. We then perform an extensive human evaluation using carefully thought out lipreading exercises to validate the quality of our designed platform against the existing lipreading platforms. Our studies concretely point toward the potential of our approach in developing a large-scale lipreading MOOC platform that can impact millions of people with hearing loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题