使用半监督的学习进行唤醒预测的块分割向量

论文标题

使用半监督的学习进行唤醒预测的块分割向量

Block-Segmentation Vectors for Arousal Prediction using Semi-supervised Learning

论文作者

Odaka, Yuki, Kaneiwa, Ken

论文摘要

为了处理计算机应用中的情绪表达，罗素的圆周模型对于根据价和唤醒表示情绪很有用。在SentiWordnet中，使用半监督学习将自动分配给大量的合成集（WordNet中的同义词组）。但是，当分配唤醒水平时，SentiWordnet提出的现有方法降低了情感预测的准确性。在本文中，我们提出了一个块分割矢量，用于使用半监督学习的少数标记单词从少数标记的单词中预测许多合成器的唤醒水平。我们通过将句子的分布与价单词的分布进行比较，分析了唤醒和非简洁词的分布。我们解决了一个问题，即当在某些句子中混合唤醒和非曲子单词时，唤醒级别的预测会失败。为了捕获这种唤醒和非简洁词的特征，我们根据块ID的倒置索引生成词向量，其中语料库被分为句子流中的块。在评估实验中，我们表明，与块分割矢量相比，唤醒预测的结果优于SentiWordnet中先前方法的结果。

To handle emotional expressions in computer applications, Russell's circum- plex model has been useful for representing emotions according to valence and arousal. In SentiWordNet, the level of valence is automatically assigned to a large number of synsets (groups of synonyms in WordNet) using semi-supervised learning. However, when assigning the level of arousal, the existing method proposed for SentiWordNet reduces the accuracy of sentiment prediction. In this paper, we propose a block-segmentation vector for predicting the arousal levels of many synsets from a small number of labeled words using semi-supervised learning. We analyze the distribution of arousal and non-arousal words in a corpus of sentences by comparing it with the distribution of valence words. We address the problem that arousal level prediction fails when arousal and non-arousal words are mixed together in some sentences. To capture the features of such arousal and non-arousal words, we generate word vectors based on inverted indexes by block IDs, where the corpus is divided into blocks in the flow of sentences. In the evaluation experiment, we show that the results of arousal prediction with the block-segmentation vectors outperform the results of the previous method in SentiWordNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题