论文标题
对话系统的实用自动评估的双方游戏对话集
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems
论文作者
论文摘要
对话系统评估的自动化是有效发展对话系统的推动力。本文介绍了双方游戏方法,一种用于自动化对话系统评估的对话收集方法。它解决了现有的对话集合方法的局限性:(i)无法与不公开可用的系统进行比较,以及(ii)通过有意选择要比较的系统来欺骗作弊的脆弱性。实验结果表明,使用双金属游戏方法的自动评估减轻了这两个缺点,并且与人类主观性与现有方法一样密切相关。
Automation of dialogue system evaluation is a driving force for the efficient development of dialogue systems. This paper introduces the bipartite-play method, a dialogue collection method for automating dialogue system evaluation. It addresses the limitations of existing dialogue collection methods: (i) inability to compare with systems that are not publicly available, and (ii) vulnerability to cheating by intentionally selecting systems to be compared. Experimental results show that the automatic evaluation using the bipartite-play method mitigates these two drawbacks and correlates as strongly with human subjectivity as existing methods.