论文标题

通过线性函数近似,分布稳健的离线加固学习

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

论文作者

Ma, Xiaoteng, Liang, Zhipeng, Blanchet, Jose, Liu, Mingwen, Xia, Li, Zhang, Jiheng, Zhao, Qianchuan, Zhou, Zhengyuan

论文摘要

在妨碍加强学习(RL)应用到现实世界问题的原因之一,两个因素至关重要:数据有限和测试环境(部署策略的实际环境)与培训环境(例如模拟器)之间的不匹配。本文试图通过分配强大的离线RL同时解决这些问题,在这里我们使用从源环境获得的历史数据来学习分配强大的策略,通过优化其最坏情况下的扰动。特别是,我们超越了表格设置,并考虑线性函数近似。更具体地说,我们考虑了两个设置,一个设置对数据集进行了充分探索,另一个数据集对数据集具有足够的最佳策略范围。我们提出了两种算法〜-两个设置中的每个算法〜-实现$ \ tilde {o}(d^{1/2}/n^{1/2})$和$ \ tilde {o}(d^{d^{3/2}}/n^{1/2} $ nise $ in INS $ IS $ nimim and in the linigentim,数据集中的轨迹数量。据我们所知,它们在这种情况下提供了样本复杂性的第一个非质合结果。进行了不同的实验以证明我们的理论发现,显示了我们的算法与非稳定算法的优越性。

Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e.g., a simulator). This paper attempts to address these issues simultaneously with distributionally robust offline RL, where we learn a distributionally robust policy using historical data obtained from the source environment by optimizing against a worst-case perturbation thereof. In particular, we move beyond tabular settings and consider linear function approximation. More specifically, we consider two settings, one where the dataset is well-explored and the other where the dataset has sufficient coverage of the optimal policy. We propose two algorithms~-- one for each of the two settings~-- that achieve error bounds $\tilde{O}(d^{1/2}/N^{1/2})$ and $\tilde{O}(d^{3/2}/N^{1/2})$ respectively, where $d$ is the dimension in the linear function approximation and $N$ is the number of trajectories in the dataset. To the best of our knowledge, they provide the first non-asymptotic results of the sample complexity in this setting. Diverse experiments are conducted to demonstrate our theoretical findings, showing the superiority of our algorithm against the non-robust one.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源