简单有效的无监督语音翻译

论文标题

简单有效的无监督语音翻译

Simple and Effective Unsupervised Speech Translation

论文作者

Wang, Changhan, Inaguma, Hirofumi, Chen, Peng-Jen, Kulikov, Ilia, Tang, Yun, Hsu, Wei-Ning, Auli, Michael, Pino, Juan

论文摘要

对于大多数语言，标记为训练语音任务的数据训练模型的数量受到限制，但是，对于语音翻译而言，数据稀缺会加剧，这需要标记的数据涵盖两种不同的语言。为了解决这个问题，我们研究了一种简单有效的方法来构建语音翻译系统，而没有标记数据，这是通过利用无监督的语音识别，机器翻译和语音合成的最新进展，无论是在管道方法中，还是生成伪标签来训练端到端的端到端语音翻译模型。此外，我们为预训练的语音模型提供了一种无监督的域适应技术，可改善下游无监督的语音识别的性能，尤其是对于低资源设置。实验表明，在Libri-Trans Benchmark上，在Covost 2上，3.2 BLEU在Covost 2上，无监督的语音到文本翻译的表现优于先前的无监督状态，我们的最佳系统在两年前超过了5.0 BLEU的最佳端到端端到端模型（没有预先培训），超过了5.0 Bleu五个X-Enderion。我们还报告了必C和CVSS基准的竞争结果。

The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages. To address this issue, we study a simple and effective approach to build speech translation systems without labeled data by leveraging recent advances in unsupervised speech recognition, machine translation and speech synthesis, either in a pipeline approach, or to generate pseudo-labels for training end-to-end speech translation models. Furthermore, we present an unsupervised domain adaptation technique for pre-trained speech models which improves the performance of downstream unsupervised speech recognition, especially for low-resource settings. Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art by 3.2 BLEU on the Libri-Trans benchmark, on CoVoST 2, our best systems outperform the best supervised end-to-end models (without pre-training) from only two years ago by an average of 5.0 BLEU over five X-En directions. We also report competitive results on MuST-C and CVSS benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题