论文标题

基于DNN的语义模型,用于撤销N最佳语音识别列表

DNN-Based Semantic Model for Rescoring N-best Speech Recognition List

论文作者

Fohr, Dominique, Illina, Irina

论文摘要

自动语音识别(ASR)系统的单词错误率(WER)在训练和由于噪声引起的测试条件之间发生不匹配等时增加。在这种情况下,声学信息可能会降低。这项工作旨在通过建模长期语义关系来改善ASR,以补偿扭曲的声学特征。我们建议通过重新纠正ASR n最好的假设列表来执行此操作。为了实现这一目标,我们训练一个深神经网络(DNN)。我们的DNN撤退模型旨在选择具有更好的语义一致性并因此较低的假设。我们研究了两种类型的表示作为我们DNN模型的输入特征的一部分:静态单词嵌入(来自Word2Vec)和动态上下文嵌入(来自Bert)。还包括声学和语言特征。我们对混合的真实噪声进行了公开可用的数据集TED-LIUM进行实验。所提出的撤退方法在不在两个嘈杂条件下以及N-gram和rnnlm的情况下重新撤回模型,从而在ASR系统上显着改善了WER。

The word error rate (WER) of an automatic speech recognition (ASR) system increases when a mismatch occurs between the training and the testing conditions due to the noise, etc. In this case, the acoustic information can be less reliable. This work aims to improve ASR by modeling long-term semantic relations to compensate for distorted acoustic features. We propose to perform this through rescoring of the ASR N-best hypotheses list. To achieve this, we train a deep neural network (DNN). Our DNN rescoring model is aimed at selecting hypotheses that have better semantic consistency and therefore lower WER. We investigate two types of representations as part of input features to our DNN model: static word embeddings (from word2vec) and dynamic contextual embeddings (from BERT). Acoustic and linguistic features are also included. We perform experiments on the publicly available dataset TED-LIUM mixed with real noise. The proposed rescoring approaches give significant improvement of the WER over the ASR system without rescoring models in two noisy conditions and with n-gram and RNNLM.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源