基于变压器ASR中的说话者适应方法的调查

论文标题

基于变压器ASR中的说话者适应方法的调查

Investigation of Speaker-adaptation methods in Transformer based ASR

论文作者

Shetty, Vishwas M., J, Metilda Sagaya Mary N, Umesh, S.

论文摘要

端到端模型正在快速替换自动语音识别中常规混合模型。 Transformer是一种基于机器翻译任务中普遍使用的自我发作的序列到序列模型，在用于自动语音识别时给出了有希望的结果。本文探讨了在编码器输入中合并说话者信息的不同方式，同时训练基于变压器的模型以提高其语音识别性能。我们以扬声器嵌入的形式为每个说话者提供了演讲者信息。我们使用两种类型的扬声器嵌入进行实验：X-矢量和以前的工作中提出的新型S-媒介。我们在两个数据集上报告结果a）NPTEL讲座数据库和b）LibrisPeech 500小时拆分。 NPTEL是一个开源的电子学习门户，可提供印度顶级大学的讲座。通过将扬声器嵌入到模型中的方法，我们可以将错误率的单词错误率提高。

End-to-end models are fast replacing the conventional hybrid models in automatic speech recognition. Transformer, a sequence-to-sequence model, based on self-attention popularly used in machine translation tasks, has given promising results when used for automatic speech recognition. This paper explores different ways of incorporating speaker information at the encoder input while training a transformer-based model to improve its speech recognition performance. We present speaker information in the form of speaker embeddings for each of the speakers. We experiment using two types of speaker embeddings: x-vectors and novel s-vectors proposed in our previous work. We report results on two datasets a) NPTEL lecture database and b) Librispeech 500-hour split. NPTEL is an open-source e-learning portal providing lectures from top Indian universities. We obtain improvements in the word error rate over the baseline through our approach of integrating speaker embeddings into the model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题