R-G2P：通过介绍和上下文信息融合来评估和增强素式转换素转换的鲁棒性

论文标题

R-G2P：通过介绍和上下文信息融合来评估和增强素式转换素转换的鲁棒性

r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation

论文作者

Zhao, Chendong, Wang, Jianzong, Qu, Xiaoyang, Wang, Haoqian, Xiao, Jing

论文摘要

字素至phoneme（G2P）转换是将书面形式转换为其发音的过程。它对于文本到语音（TTS）综合和自动语音识别（ASR）系统具有重要作用。在本文中，我们旨在评估和增强G2P模型的鲁棒性。我们表明，神经G2P模型对拼写错误（例如拼写错误）的拼字变化极为敏感。为了解决这个问题，我们提出了三种受控的噪声引入方法来综合嘈杂的训练数据。此外，我们将上下文信息与基准结合在一起，并提出了稳定培训过程的强大培训策略。实验结果表明，我们提出的鲁棒G2P模型（R-G2P）的表现明显优于基线（基于DICS的基准上的-2.73 \％wer，在现实世界来源上为-9.09 \％WER）。

Grapheme-to-phoneme (G2P) conversion is the process of converting the written form of words to their pronunciations. It has an important role for text-to-speech (TTS) synthesis and automatic speech recognition (ASR) systems. In this paper, we aim to evaluate and enhance the robustness of G2P models. We show that neural G2P models are extremely sensitive to orthographical variations in graphemes like spelling mistakes. To solve this problem, we propose three controlled noise introducing methods to synthesize noisy training data. Moreover, we incorporate the contextual information with the baseline and propose a robust training strategy to stabilize the training process. The experimental results demonstrate that our proposed robust G2P model (r-G2P) outperforms the baseline significantly (-2.73\% WER on Dict-based benchmarks and -9.09\% WER on Real-world sources).

下载PDF全文

下载文献需遵守相关版权规定

论文标题