论文标题

thutmose标记器:逆文本归一化的单通行神经模型

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

论文作者

Antonova, Alexandra, Bakhturina, Evelina, Ginsburg, Boris

论文摘要

逆文本归一化(ITN)是自动语音识别(ASR)中必不可少的后处理步骤。它将数字,日期,缩写和其他符号类别从ASR产生的口头形式转换为其书面形式。人们可以将ITN视为机器翻译任务,并使用神经序列到序列模型来解决它。不幸的是,这种神经模型容易产生可能导致不可接受的错误的幻觉。为了减轻此问题,我们提出了一个单个令牌分类器模型,将ITN视为标记任务。该模型将替换片段分配给每个输入令牌,或将其标记为删除或复制而无需更改。我们基于ITN示例的颗粒状对齐方式提出了一种数据集准备方法。提出的模型不太容易出现幻觉错误。该模型在Google文本归一化数据集上进行了培训,并在英语和俄罗斯测试集上实现了最先进的句子精度。标签和输入单词之间的一对一对应关系可改善模型预测的解释性,简化调试并允许后处理更正。该模型比序列到序列模型更简单,并且在生产设置中更易于优化。准备数据集的模型和代码作为NEMO项目的一部分发布。

Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition (ASR). It converts numbers, dates, abbreviations, and other semiotic classes from the spoken form generated by ASR to their written forms. One can consider ITN as a Machine Translation task and use neural sequence-to-sequence models to solve it. Unfortunately, such neural models are prone to hallucinations that could lead to unacceptable errors. To mitigate this issue, we propose a single-pass token classifier model that regards ITN as a tagging task. The model assigns a replacement fragment to every input token or marks it for deletion or copying without changes. We present a dataset preparation method based on the granular alignment of ITN examples. The proposed model is less prone to hallucination errors. The model is trained on the Google Text Normalization dataset and achieves state-of-the-art sentence accuracy on both English and Russian test sets. One-to-one correspondence between tags and input words improves the interpretability of the model's predictions, simplifies debugging, and allows for post-processing corrections. The model is simpler than sequence-to-sequence models and easier to optimize in production settings. The model and the code to prepare the dataset is published as part of NeMo project.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源