使用CTC指南加速RNN-T训练和推理

论文标题

使用CTC指南加速RNN-T训练和推理

Accelerating RNN-T Training and Inference Using CTC guidance

论文作者

Wang, Yongqiang, Chen, Zhehuai, Zheng, Chengjian, Zhang, Yu, Han, Wei, Haghani, Parisa

论文摘要

我们提出了一种基于共同训练的连接派时间分类（CTC）模型的指导，提出了一种新的方法，以加速复发神经网络传感器（RNN-T）的训练和推理过程。我们提出了一个关键的假设：如果CTC模型将编码器嵌入框架分类为空白框架，则可能将此帧与RNN-T中的所有部分比对或假设进行对齐，并且可以从解码器输入中丢弃。我们还表明，可以在编码器的中间应用此框架减少操作，从而导致RNN-T中的训练和推理显着速度。我们进一步表明，CTC对齐是CTC解码器的副产品，也可用于在训练过程中对RNN-T进行降低。我们的方法在LibrisPeech和SpeechStew任务上进行了评估。我们证明，所提出的方法能够以相似或稍好的单词错误率（WER）加速RNN-T推断2.2倍。

We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model. We made a key assumption that if an encoder embedding frame is classified as a blank frame by the CTC model, it is likely that this frame will be aligned to blank for all the partial alignments or hypotheses in RNN-T and it can be discarded from the decoder input. We also show that this frame reduction operation can be applied in the middle of the encoder, which result in significant speed up for the training and inference in RNN-T. We further show that the CTC alignment, a by-product of the CTC decoder, can also be used to perform lattice reduction for RNN-T during training. Our method is evaluated on the Librispeech and SpeechStew tasks. We demonstrate that the proposed method is able to accelerate the RNN-T inference by 2.2 times with similar or slightly better word error rates (WER).

下载PDF全文

下载文献需遵守相关版权规定

论文标题