分层发音评估以及多相关的注意

论文标题

分层发音评估以及多相关的注意

Hierarchical Pronunciation Assessment with Multi-Aspect Attention

论文作者

Do, Heejin, Kim, Yunsu, Lee, Gary Geunbae

论文摘要

自动发音评估是计算机辅助发音训练系统的主要组成部分。为了提供深入的反馈，必须在各种粒度（例如音素，单词和话语）上评分发音，并具有不同的方面，例如准确性，流利性和完整性，这是必不可少的。但是，现有的多种多种多粒性方法同时预测了所有粒度水平的所有方面。因此，他们很难捕捉音素，单词和话语的语言层次结构。这种限制进一步导致忽略了同一语言单位的亲密跨界关系。在本文中，我们提出了一个具有多光值注意（HIPAMA）模型的层次发音评估，该模型在层次上表示粒度水平，以直接捕获其语言结构，并引入了多方面的关注，反映了相同水平的各个方面的关联，以创建更多的含义表示。通过从粒度和方面侧获得关系信息，Hipama可以充分利用多任务学习。 Speachocean762数据集的实验结果的显着改善表明了Hipama的鲁棒性，尤其是在难以评估的方面。

Automatic pronunciation assessment is a major component of a computer-assisted pronunciation training system. To provide in-depth feedback, scoring pronunciation at various levels of granularity such as phoneme, word, and utterance, with diverse aspects such as accuracy, fluency, and completeness, is essential. However, existing multi-aspect multi-granularity methods simultaneously predict all aspects at all granularity levels; therefore, they have difficulty in capturing the linguistic hierarchy of phoneme, word, and utterance. This limitation further leads to neglecting intimate cross-aspect relations at the same linguistic unit. In this paper, we propose a Hierarchical Pronunciation Assessment with Multi-aspect Attention (HiPAMA) model, which hierarchically represents the granularity levels to directly capture their linguistic structures and introduces multi-aspect attention that reflects associations across aspects at the same level to create more connotative representations. By obtaining relational information from both the granularity- and aspect-side, HiPAMA can take full advantage of multi-task learning. Remarkable improvements in the experimental results on the speachocean762 datasets demonstrate the robustness of HiPAMA, particularly in the difficult-to-assess aspects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题