论文标题
普通话韵律结构预测的基于角色级的跨度模型
A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
论文作者
论文摘要
韵律结构预测的准确性对于普通话文本到语音系统中合成语音的自然性至关重要,但现在受到以前单词分割结果中广泛使用的序列到序列框架和错误积累的限制。在本文中,我们建议基于跨度的普通话韵律结构预测模型获得最佳的韵律结构树,可以将其转换为相应的韵律标签序列。丰富的语言特征不是汉语级别的伯特(Bert)提供,而不是单词分割的先决条件,并通过自我发明体系结构发送给编码器。最重要的是,跨度表示和标签评分用于描述所有可能的韵律结构树,每棵树都有相应的分数。为了找到给定句子得分最高的最佳树,进一步使用了自下而上的CKY风格算法。所提出的方法可以同时预测不同级别的韵律标签,并以端到端的方式直接从汉字完成该过程。两个现实世界数据集的实验结果证明了我们基于跨度的方法在所有序列到序列基线方法上的出色性能。
The accuracy of prosodic structure prediction is crucial to the naturalness of synthesized speech in Mandarin text-to-speech system, but now is limited by widely-used sequence-to-sequence framework and error accumulation from previous word segmentation results. In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence. Instead of the prerequisite for word segmentation, rich linguistic features are provided by Chinese character-level BERT and sent to encoder with self-attention architecture. On top of this, span representation and label scoring are used to describe all possible prosodic structure trees, of which each tree has its corresponding score. To find the optimal tree with the highest score for a given sentence, a bottom-up CKY-style algorithm is further used. The proposed method can predict prosodic labels of different levels at the same time and accomplish the process directly from Chinese characters in an end-to-end manner. Experiment results on two real-world datasets demonstrate the excellent performance of our span-based method over all sequence-to-sequence baseline approaches.