用于自主探索和多目标最短路径的近距离算法

论文标题

用于自主探索和多目标最短路径的近距离算法

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

论文作者

Cai, Haoyuan, Ma, Tengyu, Du, Simon

论文摘要

我们重新审查了Lim＆Auer（2012）提出的增量自主探索问题。在这种情况下，代理商旨在学习一组近乎最佳的目标条件政策，以达到$ L $ - 可控制状态：从初始状态$ s_0 $ s_0 $在预期的$ l $ spep中逐渐达到的状态。我们引入了一种比现有算法更强的样本复杂性界限的新算法。此外，我们还证明了自主探索问题的第一个下限。特别是，下限意味着我们提出的算法，即价值自主探索，当$ l $ - 可控制状态的数量相对于$ l $，几乎是最小的。我们算法设计中的关键是自主探索与多目标最短路径之间的联系，这是一个自然概括经典随机最短路径问题的新问题。这个新问题及其与自主探索的联系可能具有独立的兴趣。

We revisit the incremental autonomous exploration problem proposed by Lim & Auer (2012). In this setting, the agent aims to learn a set of near-optimal goal-conditioned policies to reach the $L$-controllable states: states that are incrementally reachable from an initial state $s_0$ within $L$ steps in expectation. We introduce a new algorithm with stronger sample complexity bounds than existing ones. Furthermore, we also prove the first lower bound for the autonomous exploration problem. In particular, the lower bound implies that our proposed algorithm, Value-Aware Autonomous Exploration, is nearly minimax-optimal when the number of $L$-controllable states grows polynomially with respect to $L$. Key in our algorithm design is a connection between autonomous exploration and multi-goal stochastic shortest path, a new problem that naturally generalizes the classical stochastic shortest path problem. This new problem and its connection to autonomous exploration can be of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题