频道意识到的电话编码器训练器的自我监督模型用于电话宣传ASR

论文标题

频道意识到的电话编码器训练器的自我监督模型用于电话宣传ASR

Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR

论文作者

Sukhadia, Vrunda N., Arunkumar, A., Umesh, S.

论文摘要

本文提出了一种新颖的技术，以从联合编码器编码器自学模型中获得更好的下游ASR性能，并从两个不同的频道（狭窄和宽带）汇集的语音训练时。联合编码器解码器自我监督模型通过变压器解码器扩展了Hubert模型。休伯特（Hubert）执行特征的聚类，并预测每个输入框架的类别。在我们的基线简单池中，无法识别频道信息。为了合并渠道信息，我们提出了来自不同渠道的语音的非重叠群集ID。我们的方法给出了与简单的数据汇总构建的联合编码器自我监督模型相对的相对提高约4％，这是我们的基线。

This paper proposes a novel technique to obtain better downstream ASR performance from a joint encoder-decoder self-supervised model when trained with speech pooled from two different channels (narrow and wide band). The joint encoder-decoder self-supervised model extends the HuBERT model with a Transformer decoder. HuBERT performs clustering of features and predicts the class of every input frame. In simple pooling, which is our baseline, there is no way to identify the channel information. To incorporate channel information, we have proposed non-overlapping cluster IDs for speech from different channels. Our method gives a relative improvement of ~4% over the joint encoder-decoder self-supervised model built with simple pooling of data, which serves as our baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题