X战警：保证的XOR-Maximum熵限制了逆增强学习

论文标题

X战警：保证的XOR-Maximum熵限制了逆增强学习

X-MEN: Guaranteed XOR-Maximum Entropy Constrained Inverse Reinforcement Learning

论文作者

Ding, Fan, Xue, Yeiang

论文摘要

逆增强学习（IRL）是从演示中学习的强大方式。在本文中，我们解决了IRL问题，即先验知识的可用性，即最佳政策永远不会违反某些约束。忽略这些约束的常规方法需要许多演示才能融合。我们提出了XOR-Maximum熵约束逆增强学习（X-MEN），该学习可以保证将线性速率W.R.T.中的最佳策略收集到最佳策略。学习迭代的数量。 X战警嵌入XOR采样 - 一种可证明的采样方法，将#P完整的采样问题转换为NP oracles的查询 - 最大熵IRL的框架。 X战警还保证，学识渊博的政策永远不会产生违反约束的轨迹。导航中的经验结果表明，与基线方法相比，X战警会收敛到最佳策略的速度更快，并且始终产生满足多状态组合约束的轨迹。

Inverse Reinforcement Learning (IRL) is a powerful way of learning from demonstrations. In this paper, we address IRL problems with the availability of prior knowledge that optimal policies will never violate certain constraints. Conventional approaches ignoring these constraints need many demonstrations to converge. We propose XOR-Maximum Entropy Constrained Inverse Reinforcement Learning (X-MEN), which is guaranteed to converge to the optimal policy in linear rate w.r.t. the number of learning iterations. X-MEN embeds XOR-sampling -- a provable sampling approach that transforms the #P complete sampling problem into queries to NP oracles -- into the framework of maximum entropy IRL. X-MEN also guarantees the learned policy will never generate trajectories that violate constraints. Empirical results in navigation demonstrate that X-MEN converges faster to the optimal policies compared to baseline approaches and always generates trajectories that satisfy multi-state combinatorial constraints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题