论文标题

X战警:保证的XOR-Maximum熵限制了逆增强学习

X-MEN: Guaranteed XOR-Maximum Entropy Constrained Inverse Reinforcement Learning

论文作者

Ding, Fan, Xue, Yeiang

论文摘要

逆增强学习(IRL)是从演示中学习的强大方式。在本文中,我们解决了IRL问题,即先验知识的可用性,即最佳政策永远不会违反某些约束。忽略这些约束的常规方法需要许多演示才能融合。我们提出了XOR-Maximum熵约束逆增强学习(X-MEN),该学习可以保证将线性速率W.R.T.中的最佳策略收集到最佳策略。学习迭代的数量。 X战警嵌入XOR采样 - 一种可证明的采样方法,将#P完整的采样问题转换为NP oracles的查询 - 最大熵IRL的框架。 X战警还保证,学识渊博的政策永远不会产生违反约束的轨迹。导航中的经验结果表明,与基线方法相比,X战警会收敛到最佳策略的速度更快,并且始终产生满足多状态组合约束的轨迹。

Inverse Reinforcement Learning (IRL) is a powerful way of learning from demonstrations. In this paper, we address IRL problems with the availability of prior knowledge that optimal policies will never violate certain constraints. Conventional approaches ignoring these constraints need many demonstrations to converge. We propose XOR-Maximum Entropy Constrained Inverse Reinforcement Learning (X-MEN), which is guaranteed to converge to the optimal policy in linear rate w.r.t. the number of learning iterations. X-MEN embeds XOR-sampling -- a provable sampling approach that transforms the #P complete sampling problem into queries to NP oracles -- into the framework of maximum entropy IRL. X-MEN also guarantees the learned policy will never generate trajectories that violate constraints. Empirical results in navigation demonstrate that X-MEN converges faster to the optimal policies compared to baseline approaches and always generates trajectories that satisfy multi-state combinatorial constraints.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源