珊瑚：上下文感知的克罗地亚虐待语言数据集

论文标题

珊瑚：上下文感知的克罗地亚虐待语言数据集

CoRAL: a Context-aware Croatian Abusive Language Dataset

论文作者

Shekhar, Ravi, Karan, Mladen, Purver, Matthew

论文摘要

鉴于互联网和社交媒体的普及前所未有的增长，评论节制从来都不是一项更相关的任务。半自动化的评论审核系统通过自动对示例进行分类或允许主持人可以优先考虑要首先考虑哪些注释，从而极大地帮助人类调节器。但是，不适当内容的概念通常是主观的，并且可以以许多微妙和间接的方式传达此类内容。在这项工作中，我们提出了珊瑚 - 一种语言和文化意识的克罗地亚虐待数据集，涵盖了隐性和依赖本地和全球环境的现象。我们通过实验表明，当评论未明确时，当前模型会降低，并且当需要语言技能和上下文知识来解释评论时，当前模型会进一步降低。

In light of unprecedented increases in the popularity of the internet and social media, comment moderation has never been a more relevant task. Semi-automated comment moderation systems greatly aid human moderators by either automatically classifying the examples or allowing the moderators to prioritize which comments to consider first. However, the concept of inappropriate content is often subjective, and such content can be conveyed in many subtle and indirect ways. In this work, we propose CoRAL -- a language and culturally aware Croatian Abusive dataset covering phenomena of implicitness and reliance on local and global context. We show experimentally that current models degrade when comments are not explicit and further degrade when language skill and context knowledge are required to interpret the comment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题