在线性时间逻辑规格下的马尔可夫决策过程的折扣奖励最佳政策的综合

论文标题

在线性时间逻辑规格下的马尔可夫决策过程的折扣奖励最佳政策的综合

Synthesis of Discounted-Reward Optimal Policies for Markov Decision Processes Under Linear Temporal Logic Specifications

论文作者

Kalagarla, Krishna C., Jain, Rahul, Nuzzo, Pierluigi

论文摘要

我们提出了一种根据一般线性时间逻辑（LTL）规范的折扣马尔可夫决策过程找到有关奖励功能的最佳策略的方法。先前的工作要么重点是在有限持续的任务下最大化累积奖励目标，该任务是由语法上的安全LTL指定的，要么最大程度地提高了对持久性（例如，监视）任务的平均奖励。本文通过引入一对占用措施来表达LTL满意度目标和预期的折扣奖励目标，从而扩展了这些结果。然后，这些占用措施通过新颖的减少来连接到单个策略，从而导致混合整数线性程序，该计划提供了最佳的策略。我们的配方也可以扩展到有关次要奖励功能的其他约束。我们说明了在不确定性和绩效目标下对复杂任务的机器人运动计划的背景下，我们的方法的有效性。

We present a method to find an optimal policy with respect to a reward function for a discounted Markov decision process under general linear temporal logic (LTL) specifications. Previous work has either focused on maximizing a cumulative reward objective under finite-duration tasks, specified by syntactically co-safe LTL, or maximizing an average reward for persistent (e.g., surveillance) tasks. This paper extends and generalizes these results by introducing a pair of occupancy measures to express the LTL satisfaction objective and the expected discounted reward objective, respectively. These occupancy measures are then connected to a single policy via a novel reduction resulting in a mixed integer linear program whose solution provides an optimal policy. Our formulation can also be extended to include additional constraints with respect to secondary reward functions. We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题