营地：用于有效计划的特定于学习环境的抽象

论文标题

营地：用于有效计划的特定于学习环境的抽象

CAMPs: Learning Context-Specific Abstractions for Efficient Planning in Factored MDPs

论文作者

Chitnis, Rohan, Silver, Tom, Kim, Beomjoon, Kaelbling, Leslie Pack, Lozano-Perez, Tomas

论文摘要

元规划或学习从经验中指导计划，是提高计划成本的有前途的方法。一般的元规划策略是学会对所考虑的州和代理采取的行动施加限制。我们观察到（1）强加约束可以诱导特定于上下文的独立性，从而使域的某些方面无关紧要，并且（2）代理可以通过对其自身行为强加约束来利用这一事实。这些观察结果使我们提出了上下文特定的抽象马尔可夫决策过程（CAMP），这是对有效计划的商品MDP的抽象。然后，我们描述了如何学习限制以强加的限制，从而优化了奖励和计算成本之间的权衡。我们的实验考虑了跨四个领域的五个计划者，包括在可移动障碍（NAMO）之间进行机器人导航，机器人任务和运动计划，以进行顺序操作以及经典计划。我们发现有学识渊博的营地的计划，以持续超过基线，包括Stilman的NAMO特定算法。视频：https：//youtu.be/wtxt6djcad4代码：https：//git.io/jtnf6

Meta-planning, or learning to guide planning from experience, is a promising approach to improving the computational cost of planning. A general meta-planning strategy is to learn to impose constraints on the states considered and actions taken by the agent. We observe that (1) imposing a constraint can induce context-specific independences that render some aspects of the domain irrelevant, and (2) an agent can take advantage of this fact by imposing constraints on its own behavior. These observations lead us to propose the context-specific abstract Markov decision process (CAMP), an abstraction of a factored MDP that affords efficient planning. We then describe how to learn constraints to impose so the CAMP optimizes a trade-off between rewards and computational cost. Our experiments consider five planners across four domains, including robotic navigation among movable obstacles (NAMO), robotic task and motion planning for sequential manipulation, and classical planning. We find planning with learned CAMPs to consistently outperform baselines, including Stilman's NAMO-specific algorithm. Video: https://youtu.be/wTXt6djcAd4 Code: https://git.io/JTnf6

下载PDF全文

下载文献需遵守相关版权规定

论文标题