论文标题

神经理论?在大型LMS的社会情报范围内

Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs

论文作者

Sap, Maarten, LeBras, Ronan, Fried, Daniel, Choi, Yejin

论文摘要

社会智力和心理理论(汤姆),即,对所有相关人士的不同心理状态,意图和反应进行推理的能力,允许人类有效地导航和理解日常的社交互动。由于NLP系统用于日益复杂的社会环境,因此它们掌握社会动态的能力变得至关重要。在这项工作中,我们从基于经验和理论的角度研究了现代NLP系统中社会智力和心理理论的开放问题。我们表明,当今最大的语言模型之一(GPT-3; Brown等,2020)缺乏这种社交智能的框架外,使用两个任务:社会IQA(SAP等人,2019年),该任务衡量了模型的能力,可以理解社交互动的参与者的意图和反应,而tomi的参与者和tomi(le e等人,2019年),是否可以衡量STORITITITITS,是否可以衡量STORINITIS和CREATINITIS,是否可以衡量不断的想法。我们的结果表明,模型在这些心理任务理论中遇到了重大斗争,社会上的良好精度分别为55%和60%。总而言之,我们借鉴了从语用学的理论来对大型语言模型的这种缺点进行背景,并通过检查其数据,神经体系结构和培训范式的局限性。我们认为,以人为中心的NLP方法可能对神经心理理论更有效。 在我们的更新版本中,我们还分析了针对神经Tom的新教学和RLFH模型。我们发现,即使Chatgpt和GPT-4也没有表现出紧急的心理理论。令人惊讶的是,即使是GPT-4在与心理状态和现实有关的TOMI问题上仅能达到60%的准确性。

Social intelligence and Theory of Mind (ToM), i.e., the ability to reason about the different mental states, intents, and reactions of all people involved, allow humans to effectively navigate and understand everyday social interactions. As NLP systems are used in increasingly complex social situations, their ability to grasp social dynamics becomes crucial. In this work, we examine the open question of social intelligence and Theory of Mind in modern NLP systems from an empirical and theory-based perspective. We show that one of today's largest language models (GPT-3; Brown et al., 2020) lacks this kind of social intelligence out-of-the box, using two tasks: SocialIQa (Sap et al., 2019), which measures models' ability to understand intents and reactions of participants of social interactions, and ToMi (Le et al., 2019), which measures whether models can infer mental states and realities of participants of situations. Our results show that models struggle substantially at these Theory of Mind tasks, with well-below-human accuracies of 55% and 60% on SocialIQa and ToMi, respectively. To conclude, we draw on theories from pragmatics to contextualize this shortcoming of large language models, by examining the limitations stemming from their data, neural architecture, and training paradigms. Challenging the prevalent narrative that only scale is needed, we posit that person-centric NLP approaches might be more effective towards neural Theory of Mind. In our updated version, we also analyze newer instruction tuned and RLFH models for neural ToM. We find that even ChatGPT and GPT-4 do not display emergent Theory of Mind; strikingly even GPT-4 performs only 60% accuracy on the ToMi questions related to mental states and realities.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源