论文标题
不要复制老师:体现对话中的数据和模型挑战
Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue
论文作者
论文摘要
体现的对话说明以下要求代理人从自然语言交换中完成一系列复杂的任务顺序。最新的基准测试(Padmakumar等,2022)提出了一个问题,即如何最好地训练和评估这项多转,多代理,长途任务的模型。本文通过争辩说模仿学习(IL)和相关的低级指标实际上是误导性的,并且不符合体现对话研究的目标,并且可能会阻碍进步。我们提供指标的经验比较,三个模型的分析,并就领域如何最佳进步提出建议。首先,我们观察到经过IL训练的模型在评估过程中采取虚假行动。其次,我们发现现有模型无法进行地面查询话语,这对于完成任务是必不可少的。第三,我们认为评估应集中于高级语义目标。
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange. The recent introduction of benchmarks (Padmakumar et al., 2022) raises the question of how best to train and evaluate models for this multi-turn, multi-agent, long-horizon task. This paper contributes to that conversation, by arguing that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research and may hinder progress. We provide empirical comparisons of metrics, analysis of three models, and make suggestions for how the field might best progress. First, we observe that models trained with IL take spurious actions during evaluation. Second, we find that existing models fail to ground query utterances, which are essential for task completion. Third, we argue evaluation should focus on higher-level semantic goals.