Stylebert：通过字体样式信息进行训练

论文标题

Stylebert：通过字体样式信息进行训练

StyleBERT: Chinese pretraining by font style information

论文作者

Lv, Chao, Zhang, Han, Du, XinKai, Zhang, Yunhao, Huang, Ying, Li, Wenhao, Han, Jia, Gu, Shanshan

论文摘要

通过使用英语预训练的语言模型的Down流媒体任务的成功，预先训练的中文模型对于获得中文NLP任务的表现也是必要的。与英语不同，中文具有其特殊字符，例如字形信息。因此，在本文中，我们提出了中国预培训的语言模型Stylebert，该模型stylebert结合了以下嵌入信息，以增强语言模型的精明性，例如Word，Pinyin，五击和Chaizi。该实验表明，该模型在各种中国NLP任务上都取得了良好的表现。

With the success of down streaming task using English pre-trained language model, the pre-trained Chinese language model is also necessary to get a better performance of Chinese NLP task. Unlike the English language, Chinese has its special characters such as glyph information. So in this article, we propose the Chinese pre-trained language model StyleBERT which incorporate the following embedding information to enhance the savvy of language model, such as word, pinyin, five stroke and chaizi. The experiments show that the model achieves well performances on a wide range of Chinese NLP tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题