论文标题
语音到语音比较的无短信指标
A Textless Metric for Speech-to-Speech Comparison
论文作者
论文摘要
在本文中,我们引入了一种新的简单方法,用于比较语音话语而不依赖文本成绩单。我们的语音到语音比较指标利用了Hubert等最先进的Speech2unit编码器将语音发音转换为离散的声学单元。然后,我们提出了一种简单易于复制的神经体系结构,该神经体系结构学习了基于语音的度量,该指标与基于文本的对应物非常相对应。该无文本度量标准具有许多潜在的应用,包括评估口头语言的语音转换翻译,无可靠的ASR系统的语言,或者避免完全需要ASR转录。本文还表明,对于语音到语音翻译评估,ASR-BLEU(其中包括在转录本之间自动转录语音假设,参考和计算句子级别的bleu),即使ASR系统很强,也是很差的代理。
In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. Our speech-to-speech comparison metric utilizes state-of-the-art speech2unit encoders like HuBERT to convert speech utterances into discrete acoustic units. We then propose a simple and easily replicable neural architecture that learns a speech-based metric that closely corresponds to its text-based counterpart. This textless metric has numerous potential applications, including evaluating speech-to-speech translation for oral languages, languages without dependable ASR systems, or to avoid the need for ASR transcription altogether. This paper also shows that for speech-to-speech translation evaluation, ASR-BLEU (which consists in automatically transcribing both speech hypothesis and reference and compute sentence-level BLEU between transcripts) is a poor proxy to real text-BLEU even when ASR system is strong.