论文标题

语言技术从业人员作为语言经理:仲裁数据偏见和ASR的预测偏见

Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR

论文作者

Markl, Nina, McNulty, Stephen Joseph

论文摘要

尽管变化是自然语言的基本特征,但自动语音识别系统在非标准化和边缘化语言品种上的系统性更差。在本文中,我们利用语言政策的镜头来分析培训和测试ASR系统中的当前实践如何导致数据偏见,从而导致这些系统的错误差异。我们认为,这是言语和语言技术从业人员了解算法偏见的起源和危害以及如何减轻言语的有用观点。我们还建议对语言资源进行重新构图为(公共)基础设施,该基础架构不应仅专为市场设计,而是为言语社区的有意义的合作而设计。

Despite the fact that variation is a fundamental characteristic of natural language, automatic speech recognition systems perform systematically worse on non-standardised and marginalised language varieties. In this paper we use the lens of language policy to analyse how current practices in training and testing ASR systems in industry lead to the data bias giving rise to these systematic error differences. We believe that this is a useful perspective for speech and language technology practitioners to understand the origins and harms of algorithmic bias, and how they can mitigate it. We also propose a re-framing of language resources as (public) infrastructure which should not solely be designed for markets, but for, and with meaningful cooperation of, speech communities.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源