长途任务计划的广泛探索的本地政策树

论文标题

长途任务计划的广泛探索的本地政策树

Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning

论文作者

Ichter, Brian, Sermanet, Pierre, Lynch, Corey

论文摘要

在现实环境中的长马计划需要能够在具有复杂动力学的高维状态空间中推理顺序任务。经典运动计划算法，例如快速探索的随机树，能够有效地探索大型状态空间和计算长马，顺序计划。但是，这些算法通常受到复杂，随机和高维状态的挑战，以及在狭窄段落的存在下，这些算法自然出现在与环境相互作用的任务中。机器学习为其学习可以处理复杂互动和高维观察的通用政策的能力提供了有希望的解决方案。但是，这些策略通常在地平线长度上受到限制。我们的方法是广泛探索的本地 - 薄树（皮带），通过通过任务条件的，基于模型的树木搜索来合并这两种方法，以利用这两者的优势。腰带使用RRT启发的树搜索有效地探索状态空间。在本地，探索是由一项任务条件，学习的政策指导的，能够执行一般的短距离任务。这个任务空间可能是相当一般和抽象的。它的唯一要求是可以进行采样，并可以很好地完成有用任务的空间。该搜索通过任务条件模型的帮助，该模型在时间上扩展动态传播，以允许长期搜索和对任务的顺序推理。在实验上证明了皮带，以便能够计划具有目标条件策略的长匹马，顺序轨迹，并产生强大的计划。

Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces with complex dynamics. Classical motion planning algorithms, such as rapidly-exploring random trees, are capable of efficiently exploring large state spaces and computing long-horizon, sequential plans. However, these algorithms are generally challenged with complex, stochastic, and high-dimensional state spaces as well as in the presence of narrow passages, which naturally emerge in tasks that interact with the environment. Machine learning offers a promising solution for its ability to learn general policies that can handle complex interactions and high-dimensional observations. However, these policies are generally limited in horizon length. Our approach, Broadly-Exploring, Local-policy Trees (BELT), merges these two approaches to leverage the strengths of both through a task-conditioned, model-based tree search. BELT uses an RRT-inspired tree search to efficiently explore the state space. Locally, the exploration is guided by a task-conditioned, learned policy capable of performing general short-horizon tasks. This task space can be quite general and abstract; its only requirements are to be sampleable and to well-cover the space of useful tasks. This search is aided by a task-conditioned model that temporally extends dynamics propagation to allow long-horizon search and sequential reasoning over tasks. BELT is demonstrated experimentally to be able to plan long-horizon, sequential trajectories with a goal conditioned policy and generate plans that are robust.

下载PDF全文

下载文献需遵守相关版权规定

论文标题