论文标题

低资源英语品种中形态句法特征检测的语料库引导的对比度集

Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

论文作者

Masis, Tessa, Neal, Anissa, Green, Lisa, O'Connor, Brendan

论文摘要

语言差异的研究研究了语言在不同的说话者组之间和内部的变化,从而阐明了我们如何使用语言来构建身份以及社会环境如何影响语言的使用。一种常见的方法是确定语料库中某种语言特征的实例 - 例如,零副构建,并分析该特征在说话者,主题和其他变量之间的分布,以使对特征功能的定性理解或系统地衡量变异。在本文中,我们探讨了低资源英语品种中自动形态句法特征检测的具有挑战性的任务。我们提出了一种通过语料库指导的编辑生成和过滤有效对比度集的人类方法。我们表明,我们的方法改善了印度英语和非裔美国人英语的功能检测,展示了它如何有助于语言研究,并发布了我们的微调模型,以供其他研究人员使用。

The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature's distribution across speakers, topics, and other variables, to either gain a qualitative understanding of the feature's function or systematically measure variation. In this paper, we explore the challenging task of automatic morphosyntactic feature detection in low-resource English varieties. We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits. We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源