基于方面的情感分析数据集调查

论文标题

基于方面的情感分析数据集调查

Survey of Aspect-based Sentiment Analysis Datasets

论文作者

Chebolu, Siva Uday Sampreeth, Dernoncourt, Franck, Lipka, Nedim, Solorio, Thamar

论文摘要

基于方面的情感分析（ABSA）是一个自然语言处理问题，需要分析用户生成的评论以确定：a）正在审查的目标实体，b）其所属的高级方面，c）对目标和方面表达的情感。 ABSA的许多但分散的语料库使研究人员很难迅速识别最适合特定ABSA子任务的Corpora。这项研究旨在提出一个可用于培训和评估自动群体ABSA系统的语料库数据库。此外，我们还概述了ABSA及其子任务的主要语料库，并突出了研究人员在选择语料库时应考虑的几个功能。最后，我们讨论当前收集方法的优势和缺点，并为未来的Corpora创建提出建议。这项调查研究了65个公开可用的ABSA数据集，其中涵盖了25个以上的域，包括45个英语和20个其他语言数据集。

Aspect-based sentiment analysis (ABSA) is a natural language processing problem that requires analyzing user-generated reviews to determine: a) The target entity being reviewed, b) The high-level aspect to which it belongs, and c) The sentiment expressed toward the targets and the aspects. Numerous yet scattered corpora for ABSA make it difficult for researchers to identify corpora best suited for a specific ABSA subtask quickly. This study aims to present a database of corpora that can be used to train and assess autonomous ABSA systems. Additionally, we provide an overview of the major corpora for ABSA and its subtasks and highlight several features that researchers should consider when selecting a corpus. Finally, we discuss the advantages and disadvantages of current collection approaches and make recommendations for future corpora creation. This survey examines 65 publicly available ABSA datasets covering over 25 domains, including 45 English and 20 other languages datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题