论文标题

大规模数据驱动语言技术时代的数据治理

Data Governance in the Age of Large-Scale Data-Driven Language Technology

论文作者

Jernite, Yacine, Nguyen, Huu, Biderman, Stella, Rogers, Anna, Masoud, Maraim, Danchev, Valentin, Tan, Samson, Luccioni, Alexandra Sasha, Subramani, Nishant, Dupont, Gérard, Dodge, Jesse, Lo, Kyle, Talat, Zeerak, Johnson, Isaac, Radev, Dragomir, Nikpoor, Somaieh, Frohberg, Jörg, Gokaslan, Aaron, Henderson, Peter, Bommasani, Rishi, Mitchell, Margaret

论文摘要

机器学习技术的最新出现和采用,特别是大型语言模型,引起了人们对语言数据进行系统和透明管理的需求。这项工作提出了一种全球语言数据治理的方法,该方法试图在利益相关者,价值观和权利之间组织数据管理。我们的建议是通过对分布式治理的先前工作来告知的,该政府涉及人类价值观,并由国际研究合作的基础,该合作将来自60个国家 /地区的研究人员和从业人员汇集在一起​​。我们提出的框架是一种专注于语言数据的多党国际治理结构,并纳入了支持其工作所需的技术和组织工具。

The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to global language data governance that attempts to organize data management amongst stakeholders, values, and rights. Our proposal is informed by prior work on distributed governance that accounts for human values and grounded by an international research collaboration that brings together researchers and practitioners from 60 countries. The framework we present is a multi-party international governance structure focused on language data, and incorporating technical and organizational tools needed to support its work.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源