F-irl：通过状态边缘匹配的逆增强学习

论文标题

F-irl：通过状态边缘匹配的逆增强学习

f-IRL: Inverse Reinforcement Learning via State Marginal Matching

论文作者

Ni, Tianwei, Sikchi, Harshit, Wang, Yufei, Gupta, Tejus, Lee, Lisa, Eysenbach, Benjamin

论文摘要

模仿学习非常适合机器人任务，在这种任务中，很难直接编程行为或指定最佳控制成本。在这项工作中，我们提出了一种学习奖励功能（以及相应的策略）以匹配专家状态密度的方法。我们的主要结果是代理和专家状态分布W.R.T.之间任何F差异的分析梯度。奖励参数。根据派生的梯度，我们提出了一种算法F-rirl，该算法通过梯度下降从专家密度中恢复了固定的奖励函数。我们表明，F-Err可以通过手工设计的目标状态密度或通过专家观察来学习行为。我们的方法在样本效率和IRL基准测试中所需的专家轨迹数量方面优于对抗性模仿学习方法。此外，我们表明，恢复的奖励功能可用于快速解决下游任务，并在难以探索的任务上证明其实用性以及在动态变化中进行的行为转移。

Imitation learning is well-suited for robotic tasks where it is difficult to directly program the behavior or specify a cost for optimal control. In this work, we propose a method for learning the reward function (and the corresponding policy) to match the expert state density. Our main result is the analytic gradient of any f-divergence between the agent and expert state distribution w.r.t. reward parameters. Based on the derived gradient, we present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent. We show that f-IRL can learn behaviors from a hand-designed target state density or implicitly through expert observations. Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories on IRL benchmarks. Moreover, we show that the recovered reward function can be used to quickly solve downstream tasks, and empirically demonstrate its utility on hard-to-explore tasks and for behavior transfer across changes in dynamics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题