论文标题
评估无监督语音表示的上下文不变
Evaluating context-invariance in unsupervised speech representations
论文作者
论文摘要
无监督的语音表征已经取消了基准(出色的,Zerospeech),这些基准表明了半监督语音识别,语音综合和仅语音语言建模的重大进展。灵感来自于``发现语言的音素''或类似低比质编码的承诺。但是,音素转录的关键属性之一是上下文不变:语音的语音上下文可以对其发音的方式产生巨大影响,而文本保持稳定。这就是允许同一词的令牌具有相同的转录的原因 - 语言理解的关键。当前的基准不能衡量上下文不变。我们开发了一个新版本的Zerospeech ABX基准测试,该基准测量上下文不合格,并将其应用于最近的自我监督表示。我们证明表示形式的上下文独立性可以预测单词级表示的稳定性。我们建议研究集中于改善自我监督和无监督的表示的上下文独立。
Unsupervised speech representations have taken off, with benchmarks (SUPERB, ZeroSpeech) demonstrating major progress on semi-supervised speech recognition, speech synthesis, and speech-only language modelling. Inspiration comes from the promise of ``discovering the phonemes'' of a language or a similar low-bitrate encoding. However, one of the critical properties of phoneme transcriptions is context-invariance: the phonetic context of a speech sound can have massive influence on the way it is pronounced, while the text remains stable. This is what allows tokens of the same word to have the same transcriptions -- key to language understanding. Current benchmarks do not measure context-invariance. We develop a new version of the ZeroSpeech ABX benchmark that measures context-invariance, and apply it to recent self-supervised representations. We demonstrate that the context-independence of representations is predictive of the stability of word-level representations. We suggest research concentrate on improving context-independence of self-supervised and unsupervised representations.