重新思考以任务为导向的对话系统中的监督学习和强化学习

论文标题

重新思考以任务为导向的对话系统中的监督学习和强化学习

Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems

论文作者

Li, Ziming, Kiseleva, Julia, de Rijke, Maarten

论文摘要

最近，以任务为导向的对话系统的对话政策学习最近主要是通过使用强化学习方法来取得巨大进步。但是，这些方法变得非常复杂。现在该重新评估它了。我们是否真的只是基于强化学习来开发对话代理的进度？我们演示了（1）〜传统监督学习与（2）〜一种无模拟器的对抗学习方法可用于实现与基于最新RL的方法相媲美的性能。首先，我们介绍了一个简单的对话动作解码器，以预测适当的动作。然后，通过添加密集的层来提高对话代理的性能，扩展了对话政策学习的传统多标签分类解决方案。最后，我们使用Gumbel-Softmax估计器来替代对话代理和对话奖励模型，而无需使用强化学习。根据我们的广泛实验，我们可以得出结论，提出的方法可以通过更少的努力来实现更稳定和更高的性能，例如设计用户模拟器所需的域知识以及在增强学习中进行的棘手参数调整。我们的主要目标不是通过有监督的学习来击败强化学习，而是要证明重新思考强化学习和监督学习在优化以任务为导向的对话系统中的作用的价值。

Dialogue policy learning for task-oriented dialogue systems has enjoyed great progress recently mostly through employing reinforcement learning methods. However, these approaches have become very sophisticated. It is time to re-evaluate it. Are we really making progress developing dialogue agents only based on reinforcement learning? We demonstrate how (1)~traditional supervised learning together with (2)~a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods. First, we introduce a simple dialogue action decoder to predict the appropriate actions. Then, the traditional multi-label classification solution for dialogue policy learning is extended by adding dense layers to improve the dialogue agent performance. Finally, we employ the Gumbel-Softmax estimator to alternatively train the dialogue agent and the dialogue reward model without using reinforcement learning. Based on our extensive experimentation, we can conclude the proposed methods can achieve more stable and higher performance with fewer efforts, such as the domain knowledge required to design a user simulator and the intractable parameter tuning in reinforcement learning. Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题