论文标题

迈向负责任的自然语言注释,以供各种阿拉伯语

Towards Responsible Natural Language Annotation for the Varieties of Arabic

论文作者

Bergman, A. Stevie, Diab, Mona T.

论文摘要

在构建NLP模型时,倾向于更广泛的覆盖范围,通常忽略文化和(社会)语言的细微差别。在该立场论文中,我们提出了对这种细微差别的关注和关注的案例,尤其是在数据集注释中,以及在此过程中包含文化和语言专业知识。我们介绍了一本针对多语言,多层直肠语言的负责任数据集创建的剧本。这项工作是通过对社交媒体内容的阿拉伯语注释的研究来告知的。

When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance. In this position paper, we make the case for care and attention to such nuances, particularly in dataset annotation, as well as the inclusion of cultural and linguistic expertise in the process. We present a playbook for responsible dataset creation for polyglossic, multidialectal languages. This work is informed by a study on Arabic annotation of social media content.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源