部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

PG3: Policy-Guided Planning for Generalized Policy Generation

论文作者

Yang, Ryan, Silver, Tom, Curtis, Aidan, Lozano-Perez, Tomas, Kaelbling, Leslie Pack

论文摘要

古典计划的一个长期目标是综合从同一领域跨越多个问题的政策。在这项工作中，我们研究了基于策略搜索的通用方法，重点是用于指导策略搜索的分数功能。我们证明了两个分数功能的局限性，并提出了一种克服这些局限性的新方法。我们的方法背后的主要思想是对广义政策生成的政策指导计划（PG3），是应使用候选政策来指导培训问题的计划，以此作为评估该候选人的机制。简化设置中的理论结果给出了PG3最佳或可允许的条件。然后，我们研究了政策搜索的特定实例化，在该搜索中，计划问题是基于PDDL的，并取消了决策列表。六个领域的经验结果证实，与几个基准相比，PG3更有效地学习通用政策。代码：https：//github.com/ryangpeixu/pg3

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generation (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines. Code: https://github.com/ryangpeixu/pg3

下载PDF全文

下载文献需遵守相关版权规定

论文标题