语音到语音比较的无短信指标

论文标题

语音到语音比较的无短信指标

A Textless Metric for Speech-to-Speech Comparison

论文作者

Besacier, Laurent, Ribeiro, Swen, Galibert, Olivier, Calapodescu, Ioan

论文摘要

在本文中，我们引入了一种新的简单方法，用于比较语音话语而不依赖文本成绩单。我们的语音到语音比较指标利用了Hubert等最先进的Speech2unit编码器将语音发音转换为离散的声学单元。然后，我们提出了一种简单易于复制的神经体系结构，该神经体系结构学习了基于语音的度量，该指标与基于文本的对应物非常相对应。该无文本度量标准具有许多潜在的应用，包括评估口头语言的语音转换翻译，无可靠的ASR系统的语言，或者避免完全需要ASR转录。本文还表明，对于语音到语音翻译评估，ASR-BLEU（其中包括在转录本之间自动转录语音假设，参考和计算句子级别的bleu），即使ASR系统很强，也是很差的代理。

In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. Our speech-to-speech comparison metric utilizes state-of-the-art speech2unit encoders like HuBERT to convert speech utterances into discrete acoustic units. We then propose a simple and easily replicable neural architecture that learns a speech-based metric that closely corresponds to its text-based counterpart. This textless metric has numerous potential applications, including evaluating speech-to-speech translation for oral languages, languages without dependable ASR systems, or to avoid the need for ASR transcription altogether. This paper also shows that for speech-to-speech translation evaluation, ASR-BLEU (which consists in automatically transcribing both speech hypothesis and reference and compute sentence-level BLEU between transcripts) is a poor proxy to real text-BLEU even when ASR system is strong.

下载PDF全文

下载文献需遵守相关版权规定

论文标题