无监督的情感分析用于代码混合数据

论文标题

无监督的情感分析用于代码混合数据

Unsupervised Sentiment Analysis for Code-mixed Data

论文作者

Yadav, Siddharth, Chakraborty, Tanmoy

论文摘要

混音是在两种或多种语言之间交替的实践。大多在多语言社会中观察到，它的发生正在增加，因此其重要性。情感分析研究的主要部分是单语，大多数在代码混合文本上的表现较差。在这项工作中，我们介绍了使用不同种类的多语言和跨语性嵌入的方法，以有效地将知识从单语文本传输到代码混合文本以进行代码混合文本的情感分析。我们的方法可以通过零拍学习来处理代码混合文本。我们的方法通过绝对3 \％f1得分对英语 - 西班牙代码混合分析进行了最新的方式。我们能够在同一基准测试中以0.58的F1得分（无平行语料库）和0.62 F1得分（带有平行的语料库）以零拍的方式获得，而监督设置中的0.68 f1得分。我们的代码公开可用。

Code-mixing is the practice of alternating between two or more languages. Mostly observed in multilingual societies, its occurrence is increasing and therefore its importance. A major part of sentiment analysis research has been monolingual, and most of them perform poorly on code-mixed text. In this work, we introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text for sentiment analysis of code-mixed text. Our methods can handle code-mixed text through a zero-shot learning. Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3\% F1-score. We are able to achieve 0.58 F1-score (without parallel corpus) and 0.62 F1-score (with parallel corpus) on the same benchmark in a zero-shot way as compared to 0.68 F1-score in supervised settings. Our code is publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题