论文标题
儿童文本是否拥有常识性知识的关键?
Do Children Texts Hold The Key To Commonsense Knowledge?
论文作者
论文摘要
在AI中,编译常识性知识的全面存储库是一个长期存在的问题。许多担心围绕报告偏见的问题,即文本源中的频率不是相关性或真相的良好代理。本文探讨了儿童的文本是否符合常识性知识汇编的关键,基于这样的假设,即这种内容对读者的知识做出了更少的假设,因此更明确地阐明了常识。与多个语料库的分析表明,儿童的文本确实包含更多和更典型的常识性主张。此外,实验表明,可以在流行的基于语言模式的常识性知识提取设置中利用这一优势,在这些设置中,对少量儿童文本(Childbert)的任务不合理的微调已经产生重大改进。这提供了一种令人耳目一新的观点,不同于从越来越大的模型和语料库中取得进步的共同趋势。
Compiling comprehensive repositories of commonsense knowledge is a long-standing problem in AI. Many concerns revolve around the issue of reporting bias, i.e., that frequency in text sources is not a good proxy for relevance or truth. This paper explores whether children's texts hold the key to commonsense knowledge compilation, based on the hypothesis that such content makes fewer assumptions on the reader's knowledge, and therefore spells out commonsense more explicitly. An analysis with several corpora shows that children's texts indeed contain much more, and more typical commonsense assertions. Moreover, experiments show that this advantage can be leveraged in popular language-model-based commonsense knowledge extraction settings, where task-unspecific fine-tuning on small amounts of children texts (childBERT) already yields significant improvements. This provides a refreshing perspective different from the common trend of deriving progress from ever larger models and corpora.