部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues

论文作者

Cordier, Thibault, Urvoy, Tanguy, Lefèvre, Fabrice, Rojas-Barahona, Lina M.

论文摘要

以任务为导向的对话系统旨在在与人交流时实现特定目标。实际上，他们可能必须同时处理几个域和任务。因此，对话管理器必须能够考虑域的更改并计划在不同的域/任务上，以处理多域对话。但是，在这种情况下，通过加强学习变得困难，因为国家行动维度更大，而奖励信号仍然很少。我们的实验结果表明，基于图形神经网络结合不同程度的模仿学习的结构化策略可以有效地处理多域对话。报告的实验强调了结构化政策比标准策略的好处。

Task-oriented dialogue systems are designed to achieve specific goals while conversing with humans. In practice, they may have to handle simultaneously several domains and tasks. The dialogue manager must therefore be able to take into account domain changes and plan over different domains/tasks in order to deal with multidomain dialogues. However, learning with reinforcement in such context becomes difficult because the state-action dimension is larger while the reward signal remains scarce. Our experimental results suggest that structured policies based on graph neural networks combined with different degrees of imitation learning can effectively handle multi-domain dialogues. The reported experiments underline the benefit of structured policies over standard policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题