中文sense用sememewsd和同义词集嵌入

论文标题

中文sense用sememewsd和同义词集嵌入

Chinese Word Sense Embedding with SememeWSD and Synonym Set

论文作者

Zhou, Yangxi, Du, Junping, Xue, Zhe, Li, Ang, Guan, Zeli

论文摘要

单词嵌入是一项基本的自然语言处理任务，可以学习单词的特征。但是，大多数单词嵌入方法也只为一个单词分配一个向量，即使多词性单词具有多态。为了解决此限制，我们提出了SemeMewSD同义词（SWSD）模型，以在Open Hownet中的Word Sense Disampuation（WSD）和同义词的帮助下，将不同的向量分配给各种多词。我们使用SemeMewSD模型，这是一种基于Open Hownet的无监督的词义歧义模型，进行单词sense sense dismampuation，并用sense ID注释多义单词。然后，我们从Open Hownet获得了单词sense的十大同义词，并将同义词的平均向量作为sisse sense的向量。在实验中，我们使用Gensim的WMDistance方法评估了SWSD模型的语义相似性计算。它可以提高准确性。我们还检查了不同BERT模型的SemeMewSD模型，以找到更有效的模型。

Word embedding is a fundamental natural language processing task which can learn feature of words. However, most word embedding methods assign only one vector to a word, even if polysemous words have multi-senses. To address this limitation, we propose SememeWSD Synonym (SWSDS) model to assign a different vector to every sense of polysemous words with the help of word sense disambiguation (WSD) and synonym set in OpenHowNet. We use the SememeWSD model, an unsupervised word sense disambiguation model based on OpenHowNet, to do word sense disambiguation and annotate the polysemous word with sense id. Then, we obtain top 10 synonyms of the word sense from OpenHowNet and calculate the average vector of synonyms as the vector of the word sense. In experiments, We evaluate the SWSDS model on semantic similarity calculation with Gensim's wmdistance method. It achieves improvement of accuracy. We also examine the SememeWSD model on different BERT models to find the more effective model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题