论文标题

产生全长的维基百科传记:性别偏见对基于检索的女性传记的影响

Generating Full Length Wikipedia Biographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies

论文作者

Fan, Angela, Gardent, Claire

论文摘要

生成事实,长篇文本(例如Wikipedia文章)提出了三个关键挑战:如何收集相关证据,如何将信息构造成良好的文本中,以及如何确保生成的文本实际上是正确的。我们通过开发一种使用检索机制的英语文本模型来解决这些问题,以在网络上识别相关的支持信息和基于缓存的预训练的编码器编码器,以通过部分制作长形的传记,包括引用信息。为了评估可用的网络证据对输出文本的影响,我们比较出现有关女性传记(网络上的信息较少的信息)与传记通常的传记时的表现。为此,我们策划了一个关于女性的1,500个传记的数据集。我们分析生成的文本,以了解可用的Web证据数据中的差异如何影响生成。我们使用自动指标和人类评估来评估生成文本的事实,流利性和质量。我们希望这些技术可以用作人类作家的起点,以帮助降低创建长形式,事实文本中固有的复杂性。

Generating factual, long-form text such as Wikipedia articles raises three key challenges: how to gather relevant evidence, how to structure information into well-formed text, and how to ensure that the generated text is factually correct. We address these by developing a model for English text that uses a retrieval mechanism to identify relevant supporting information on the web and a cache-based pre-trained encoder-decoder to generate long-form biographies section by section, including citation information. To assess the impact of available web evidence on the output text, we compare the performance of our approach when generating biographies about women (for which less information is available on the web) vs. biographies generally. To this end, we curate a dataset of 1,500 biographies about women. We analyze our generated text to understand how differences in available web evidence data affect generation. We evaluate the factuality, fluency, and quality of the generated texts using automatic metrics and human evaluation. We hope that these techniques can be used as a starting point for human writers, to aid in reducing the complexity inherent in the creation of long-form, factual text.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源