蒙版CTC：带有CTC的非自动回归端到端ASR预测

论文标题

蒙版CTC：带有CTC的非自动回归端到端ASR预测

Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict

论文作者

Higuchi, Yosuke, Watanabe, Shinji, Chen, Nanxin, Ogawa, Tetsuji, Kobayashi, Tetsunori

论文摘要

我们提出了Mask CTC，这是一种新型的非自动回旋端到端自动语音识别（ASR）框架，该框架通过完善连接派时间分类（CTC）的输出来生成序列。神经序列到序列模型通常是\ textIt {自动回归}：每个输出令牌都是通过对先前生成的令牌进行条件来生成的，而需要与输出长度一样多的迭代。另一方面，非自动回旋模型可以同时在恒定数量的迭代次数内同时生成令牌，从而导致明显的推理时间缩短，并在现实世界中为端到端的ASR模型提供了更好的拟合。在这项工作中，使用蒙版预测和CTC联合培训的变压器编码器培训蒙版CTC模型。在推断期间，目标序列用贪婪的CTC输出初始化，并且根据CTC概率掩盖了低信心令牌。基于输出令牌之间的条件依赖性，这些掩盖的低信任令牌然后预测在高信心令牌上有条件。不同语音识别任务的实验结果表明，蒙版CTC的表现优于标准CTC模型（例如，WSJ上的17.9％ - > 12.1％WER），接近自回旋模型，需要使用CPU的推理时间少得多（python实施中的0.07 RTF）。我们所有的代码都将公开使用。

We present Mask CTC, a novel non-autoregressive end-to-end automatic speech recognition (ASR) framework, which generates a sequence by refining outputs of the connectionist temporal classification (CTC). Neural sequence-to-sequence models are usually \textit{autoregressive}: each output token is generated by conditioning on previously generated tokens, at the cost of requiring as many iterations as the output length. On the other hand, non-autoregressive models can simultaneously generate tokens within a constant number of iterations, which results in significant inference time reduction and better suits end-to-end ASR model for real-world scenarios. In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC. During inference, the target sequence is initialized with the greedy CTC outputs and low-confidence tokens are masked based on the CTC probabilities. Based on the conditional dependence between output tokens, these masked low-confidence tokens are then predicted conditioning on the high-confidence tokens. Experimental results on different speech recognition tasks show that Mask CTC outperforms the standard CTC model (e.g., 17.9% -> 12.1% WER on WSJ) and approaches the autoregressive model, requiring much less inference time using CPUs (0.07 RTF in Python implementation). All of our codes will be publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题