论文标题

我施加了探测思想:学习与龙和龙的意图和意图和思想的指导和指导

I Cast Detect Thoughts: Learning to Converse and Guide with Intents and Theory-of-Mind in Dungeons and Dragons

论文作者

Zhou, Pei, Zhu, Andrew, Hu, Jennifer, Pujara, Jay, Ren, Xiang, Callison-Burch, Chris, Choi, Yejin, Ammanabrolu, Prithviraj

论文摘要

我们提出了一项新的任务G4C,以在目标驱动和扎根的环境中研究教师学生的自然语言互动。 Dungeons and Dragons(D&D)是一款角色扮演游戏,为研究这种交互作用提供了理想的设置。在这里,地牢大师(DM),即老师,指导几位球员的行为 - 学生,每个人都有自己的角色和能力 - 实现以幻想世界为基础的共同目标。我们的方法是将这些相互作用分解和建模为(1)DM指导玩家实现给定目标的意图; (2)DM对表达这种意图的玩家的指导话语; (3)一种脑海理论(TOM)模型,可以预见玩家对指导的反应,将其转变为未来。我们开发了一种新颖的增强学习方法(RL)方法,用于训练DM,该方法通过奖励意图与Tom Amperipatiencated玩家的行动相匹配的话语来为玩家生成指导。人类和自动化的评估表明,经过培训的DM明确模型并使用RL纳入了玩家的TOM,生成了更好的质量指导,比香草自然语言(NLG)方法更有可能实现DM的意图3倍。

We propose a novel task, G4C, to study teacher-student natural language interactions in a goal-driven and grounded environment. Dungeons and Dragons (D&D), a role-playing game, provides an ideal setting to investigate such interactions. Here, the Dungeon Master (DM), i.e., the teacher, guides the actions of several players -- students, each with their own personas and abilities -- to achieve shared goals grounded in a fantasy world. Our approach is to decompose and model these interactions into (1) the DM's intent to guide players toward a given goal; (2) the DM's guidance utterance to the players expressing this intent; and (3) a theory-of-mind (ToM) model that anticipates the players' reaction to the guidance one turn into the future. We develop a novel reinforcement learning (RL) method for training a DM that generates guidance for players by rewarding utterances where the intent matches the ToM-anticipated player actions. Human and automated evaluations show that a DM trained to explicitly model intents and incorporate ToM of the players using RL generates better-quality guidance that is 3x more likely to fulfill the DM's intent than a vanilla natural language generation (NLG) approach.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源