论文标题
通过预先训练的技能进行稀疏奖励的长胜目标探索目标探索目标攻击目标。
Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning
论文作者
论文摘要
强化学习(RL)经常在复杂的环境中努力完成稀疏的奖励长胜利任务。目标条件的加固学习(GCRL)已被用来通过易于到达的子目标解决这一困难问题。在GCRL中,探索新颖的子目标对于代理商最终找到达到期望目标的途径至关重要。如何有效地探索新颖的子目标是GCRL中最具挑战性的问题之一。已经提出了几种目标探索方法来解决这个问题,但仍然很难有效地找到所需的目标。在本文中,我们提出了一个新颖的学习目标,通过优化实现的目标和新目标的熵,以在基于亚目标的GCRL中进行更有效的目标探索。为了优化这一目标,我们首先探索并利用与当前任务相似的环境中开采的经常发生的目标转换模式,以通过技能学习构成技能。然后,预处理的技能被应用于目标探索。对各种备用奖励的长期基准测试任务的评估表明,将我们的方法纳入几种最先进的GCRL基准中,可以显着提高其勘探效率,同时提高或保持其性能。源代码可在以下网址获得:https://github.com/geaps/geaps。
Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal. How to explore novel sub-goals efficiently is one of the most challenging issues in GCRL. Several goal exploration methods have been proposed to address this issue but still struggle to find the desired goals efficiently. In this paper, we propose a novel learning objective by optimizing the entropy of both achieved and new goals to be explored for more efficient goal exploration in sub-goal selection based GCRL. To optimize this objective, we first explore and exploit the frequently occurring goal-transition patterns mined in the environments similar to the current task to compose skills via skill learning. Then, the pretrained skills are applied in goal exploration. Evaluation on a variety of spare-reward long-horizon benchmark tasks suggests that incorporating our method into several state-of-the-art GCRL baselines significantly boosts their exploration efficiency while improving or maintaining their performance. The source code is available at: https://github.com/GEAPS/GEAPS.