语音的一致转录和翻译

论文标题

语音的一致转录和翻译

Consistent Transcription and Translation of Speech

论文作者

Sperber, Matthias, Setiawan, Hendra, Gollan, Christian, Nallasamy, Udhyakumar, Paulik, Matthias

论文摘要

语音翻译中的常规范式始于语音识别步骤，以生成成绩单，然后以自动成绩单作为输入的翻译步骤。为了解决此范式的各种缺点，最近的工作探讨了端到端可训练的直接模型，这些模型无需转录而转换。但是，成绩单可能是实际应用中必不可少的输出，该应用程序通常与用户翻译并肩作用。我们使这一共同的要求明确，并探索共同转录和翻译语音的任务。虽然成绩单和翻译的高精度至关重要，但即使是高度精确的系统也可能遭受降低用户体验的两个输出之间的不一致。我们介绍了一种评估一致性并比较几种建模方法的方法，包括传统的级联方法和端到端模型。我们发现直接模型适合联合转录/翻译任务，但是具有耦合推理过程的端到端模型能够实现强大的一致性。我们进一步介绍了直接优化一致性的简单技术，并分析一致性，转录准确性和翻译精度之间所得的权衡。

The conventional paradigm in speech translation starts with a speech recognition step to generate transcripts, followed by a translation step with the automatic transcripts as input. To address various shortcomings of this paradigm, recent work explores end-to-end trainable direct models that translate without transcribing. However, transcripts can be an indispensable output in practical applications, which often display transcripts alongside the translations to users. We make this common requirement explicit and explore the task of jointly transcribing and translating speech. While high accuracy of transcript and translation are crucial, even highly accurate systems can suffer from inconsistencies between both outputs that degrade the user experience. We introduce a methodology to evaluate consistency and compare several modeling approaches, including the traditional cascaded approach and end-to-end models. We find that direct models are poorly suited to the joint transcription/translation task, but that end-to-end models that feature a coupled inference procedure are able to achieve strong consistency. We further introduce simple techniques for directly optimizing for consistency, and analyze the resulting trade-offs between consistency, transcription accuracy, and translation accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题