论文标题

使用BI-LSTM CRF模型对日本Hiragana句子的形态分析

Morphological Analysis of Japanese Hiragana Sentences using the BI-LSTM CRF Model

论文作者

Izutsu, Jun, Komiya, Kanako

论文摘要

这项研究提出了一种使用BI-LSTM CRF模型来开发日本Hiragana句子形态分析仪的神经模型的方法。形态分析是一种将文本数据分为单词并分配信息等信息的技术。该技术在日本自然语言处理系统中的下游应用中起着至关重要的作用,因为日语之间没有单词之间的单词定界表。 Hiragana是一种日本唱机字符,用于儿童或无法阅读汉字的人的文本。比拉加纳句子的形态分析比普通日本句子的句子更加困难,因为分裂的信息较少。为了对Hiragana句子进行形态学分析,我们证明了使用基于普通日本文本的模型进行微调的有效性,并检查了训练数据对各种流派文本的影响。

This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. This technique plays an essential role in downstream applications in Japanese natural language processing systems because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源