基于差异自动编码器的变异性变异性编码违反语音识别

论文标题

基于差异自动编码器的变异性变异性编码违反语音识别

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition

论文作者

Xie, Xurong, Ruzi, Rukiye, Liu, Xunying, Wang, Lan

论文摘要

违反语音识别是由于声学差异和有限的可用数据，这是一项具有挑战性的任务。违反扬声器的不同条件解释了声学变异性，这使得难以精确建模变异性。本文介绍了基于各种自动编码器的可变性编码器（VAEVE），以明确编码违反语音的这种可变性。 VAEVE同时使用音素信息和低维的潜在变量来重建输入声学特征，从而被迫编码与音素无关的可变性。随机梯度变异贝叶斯算法用于建模用于生成可变性编码的分布，这些分布进一步用作DNN声学建模的辅助特征。在Uapeech语料库上进行的实验结果表明，基于VAEVE的可变性编码对学习隐藏单位贡献（LHUC）扬声器的适应性具有互补作用。使用可变性编码的系统在不使用它们的情况下始终胜过可比的基线系统，并在质心质量较低的质心语音上降低了绝对单词错误率（WER）最高2.2％，而在具有多元化或不确定条件的“混合”质心语音类型的“混合”类型的质量障碍语音上高达2％。

Dysarthric speech recognition is a challenging task due to acoustic variability and limited amount of available data. Diverse conditions of dysarthric speakers account for the acoustic variability, which make the variability difficult to be modeled precisely. This paper presents a variational auto-encoder based variability encoder (VAEVE) to explicitly encode such variability for dysarthric speech. The VAEVE makes use of both phoneme information and low-dimensional latent variable to reconstruct the input acoustic features, thereby the latent variable is forced to encode the phoneme-independent variability. Stochastic gradient variational Bayes algorithm is applied to model the distribution for generating variability encodings, which are further used as auxiliary features for DNN acoustic modeling. Experiment results conducted on the UASpeech corpus show that the VAEVE based variability encodings have complementary effect to the learning hidden unit contributions (LHUC) speaker adaptation. The systems using variability encodings consistently outperform the comparable baseline systems without using them, and" obtain absolute word error rate (WER) reduction by up to 2.2% on dysarthric speech with "Very lowintelligibility level, and up to 2% on the "Mixed" type of dysarthric speech with diverse or uncertain conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题