有因果纠正加固学习的部分模型

论文标题

有因果纠正加固学习的部分模型

Causally Correct Partial Models for Reinforcement Learning

论文作者

Rezende, Danilo J., Danihelka, Ivo, Papamakarios, George, Ke, Nan Rosemary, Jiang, Ray, Weber, Theophane, Gregor, Karol, Merzic, Hamza, Viola, Fabio, Wang, Jane, Mitrovic, Jovana, Besse, Frederic, Antonoglou, Ioannis, Buesing, Lars

论文摘要

在加强学习中，我们可以学习未来观察和奖励的模型，并使用它来计划代理商的下一个行动。但是，如果观测值高（例如图像），则共同对未来观察结果进行建模可能在计算上很昂贵，甚至是棘手的。因此，以前的作品考虑了部分模型，该模型仅模拟了观察结果的一部分。在本文中，我们表明部分模型可能是因果关系不正确的：它们与他们没有建模的观察结果混淆，因此可能导致不正确的计划。为了解决这个问题，我们介绍了一个普遍的部分模型家族，这些模型被证明是在因果关系上正确的，但要保持速度，因为它们不需要完全模拟未来的观察结果。

In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent's next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this paper, we show that partial models can be causally incorrect: they are confounded by the observations they don't model, and can therefore lead to incorrect planning. To address this, we introduce a general family of partial models that are provably causally correct, yet remain fast because they do not need to fully model future observations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题