我施加了探测思想：学习与龙和龙的意图和意图和思想的指导和指导

论文标题

我施加了探测思想：学习与龙和龙的意图和意图和思想的指导和指导

I Cast Detect Thoughts: Learning to Converse and Guide with Intents and Theory-of-Mind in Dungeons and Dragons

论文作者

Zhou, Pei, Zhu, Andrew, Hu, Jennifer, Pujara, Jay, Ren, Xiang, Callison-Burch, Chris, Choi, Yejin, Ammanabrolu, Prithviraj

论文摘要

我们提出了一项新的任务G4C，以在目标驱动和扎根的环境中研究教师学生的自然语言互动。 Dungeons and Dragons（D＆D）是一款角色扮演游戏，为研究这种交互作用提供了理想的设置。在这里，地牢大师（DM），即老师，指导几位球员的行为 - 学生，每个人都有自己的角色和能力 - 实现以幻想世界为基础的共同目标。我们的方法是将这些相互作用分解和建模为（1）DM指导玩家实现给定目标的意图；（2）DM对表达这种意图的玩家的指导话语；（3）一种脑海理论（TOM）模型，可以预见玩家对指导的反应，将其转变为未来。我们开发了一种新颖的增强学习方法（RL）方法，用于训练DM，该方法通过奖励意图与Tom Amperipatiencated玩家的行动相匹配的话语来为玩家生成指导。人类和自动化的评估表明，经过培训的DM明确模型并使用RL纳入了玩家的TOM，生成了更好的质量指导，比香草自然语言（NLG）方法更有可能实现DM的意图3倍。

We propose a novel task, G4C, to study teacher-student natural language interactions in a goal-driven and grounded environment. Dungeons and Dragons (D&D), a role-playing game, provides an ideal setting to investigate such interactions. Here, the Dungeon Master (DM), i.e., the teacher, guides the actions of several players -- students, each with their own personas and abilities -- to achieve shared goals grounded in a fantasy world. Our approach is to decompose and model these interactions into (1) the DM's intent to guide players toward a given goal; (2) the DM's guidance utterance to the players expressing this intent; and (3) a theory-of-mind (ToM) model that anticipates the players' reaction to the guidance one turn into the future. We develop a novel reinforcement learning (RL) method for training a DM that generates guidance for players by rewarding utterances where the intent matches the ToM-anticipated player actions. Human and automated evaluations show that a DM trained to explicitly model intents and incorporate ToM of the players using RL generates better-quality guidance that is 3x more likely to fulfill the DM's intent than a vanilla natural language generation (NLG) approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题