具有控制变体的多降差增强学习

论文标题

具有控制变体的多降差增强学习

Multifidelity Reinforcement Learning with Control Variates

论文作者

Khairy, Sami, Balaprakash, Prasanna

论文摘要

在许多计算科学和工程应用中，与给定输入相对应的感兴趣系统的输出可以在不同级别的忠诚度中以不同的成本进行查询。通常，低保真数据便宜且丰富，而高保真数据却很昂贵且稀缺。在这项工作中，我们研究了具有不同水平的保真度以针对给定的控制任务的多个环境中的增强学习（RL）问题。我们专注于通过多量数据数据提高RL代理的性能。具体而言，提出了利用低度和高保真回报之间的互相关的多重估计器，以减少状态行动值函数的估计差异。提出的基于控制方法的估计量用于设计多因素蒙特卡洛RL（MFMCRL）算法，该算法可改善在高保真环境中对代理的学习。理论上，通过使用概率范围来分析降低差异对政策评估和政策改进的影响。我们的理论分析和数值实验表明，对于高保真数据样本的有限预算，我们提出的MFMCRL代理与仅使用高效率环境数据的标准RL代理相比，获得了卓越的性能。

In many computational science and engineering applications, the output of a system of interest corresponding to a given input can be queried at different levels of fidelity with different costs. Typically, low-fidelity data is cheap and abundant, while high-fidelity data is expensive and scarce. In this work we study the reinforcement learning (RL) problem in the presence of multiple environments with different levels of fidelity for a given control task. We focus on improving the RL agent's performance with multifidelity data. Specifically, a multifidelity estimator that exploits the cross-correlations between the low- and high-fidelity returns is proposed to reduce the variance in the estimation of the state-action value function. The proposed estimator, which is based on the method of control variates, is used to design a multifidelity Monte Carlo RL (MFMCRL) algorithm that improves the learning of the agent in the high-fidelity environment. The impacts of variance reduction on policy evaluation and policy improvement are theoretically analyzed by using probability bounds. Our theoretical analysis and numerical experiments demonstrate that for a finite budget of high-fidelity data samples, our proposed MFMCRL agent attains superior performance compared with that of a standard RL agent that uses only the high-fidelity environment data for learning the optimal policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题