论文标题
JECC:从交互式小说中得出的常识性推理任务
JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions
论文作者
论文摘要
常识性推理模拟了人类对我们物理世界做出假设的能力,这是建立一般AI系统的必不可少的基石。我们提出了一个基于人类互动小说(如果)游戏演练的新常识性推理数据集,因为人类玩家展示了丰富而多样的常识性推理。新数据集提供了各种推理类型的自然混合物,需要多跳推理。此外,基于游戏的施工程序所需的人为干预措施比以前的干预措施要少得多。与现有基准不同,我们的数据集专注于功能常识性知识规则而不是事实知识的评估。因此,为了在我们的任务上实现更高的绩效,模型需要有效利用这种功能知识来推断行动的结果,而不是仅仅依靠记忆事实。实验表明,与人类专家相比,引入的数据集对以前的机器阅读模型以及具有20%性能差距的新型大语言模型具有挑战性。
Commonsense reasoning simulates the human ability to make presumptions about our physical world, and it is an essential cornerstone in building general AI systems. We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs as human players demonstrate plentiful and diverse commonsense reasoning. The new dataset provides a natural mixture of various reasoning types and requires multi-hop reasoning. Moreover, the IF game-based construction procedure requires much less human interventions than previous ones. Different from existing benchmarks, our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge. Hence, in order to achieve higher performance on our tasks, models need to effectively utilize such functional knowledge to infer the outcomes of actions, rather than relying solely on memorizing facts. Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models with a significant 20% performance gap compared to human experts.