通过因果结构化世界模型的离线加强学习

论文标题

通过因果结构化世界模型的离线加强学习

Offline Reinforcement Learning with Causal Structured World Models

论文作者

Zhu, Zheng-Mao, Chen, Xiong-Hui, Tian, Hong-Long, Zhang, Kun, Yu, Yang

论文摘要

基于模型的方法最近显示了离线增强学习（RL）的希望，旨在从历史数据中学习良好的政策而不与环境互动。以前的基于模型的离线RL方法将完全连接的网络学习为世界模型，将状态和动作映射到下一步状态。但是，明智的是，世界模型应遵守潜在的因果关系，以便它可以支持在看不见的国家中良好地概述的有效政策。在本文中，我们首先提供了理论结果，即因果世界模型可以通过将因果结构纳入概括误差结合而胜过离线RL的普通世界模型。然后，我们提出了一种实用算法，基于因果结构（重点）的基于模型的钢筋学习，以说明在离线RL中学习和利用因果结构的可行性。两个基准的实验结果表明，焦点可以准确，稳健地重建基本的因果结构。因此，它的性能比基于基于模型的离线RL算法和其他基于因果模型的RL算法更好。

Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment. Previous model-based offline RL methods learn fully connected nets as world-models that map the states and actions to the next-step states. However, it is sensible that a world-model should adhere to the underlying causal effect such that it will support learning an effective policy generalizing well in unseen states. In this paper, We first provide theoretical results that causal world-models can outperform plain world-models for offline RL by incorporating the causal structure into the generalization error bound. We then propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structure (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL. Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly. Consequently, it performs better than the plain model-based offline RL algorithms and other causal model-based RL algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题