论文标题
ACNMP:通过从表现分享中学习和加强学习,通过从演示和加强学习中学习来推出技能和任务
ACNMP: Skill Transfer and Task Extrapolation through Learning from Demonstration and Reinforcement Learning via Representation Sharing
论文作者
论文摘要
为了配备机器人具有灵巧的技能,一种有效的方法是首先通过从演示中学习(LFD)来转移所需的技能,然后让机器人通过增强学习(RL)通过自我探索来改进它。在本文中,我们提出了一种新颖的LFD+RL框架,即适应性的有条件神经运动原始剂(ACNMP),可以有效地改善新型环境和不同试剂之间的有效技能转移。这是通过利用潜在的条件神经过程(CNP)模型所学的潜在表示,以及通过监督学习(SL)同时培训该模型,以获取已显示的轨迹和通过RL进行新的轨迹发现。通过模拟实验,我们表明(i)ACNMP使系统能够推断出纯LFD失败的情况; (ii)通过SL和RL同时对系统进行培训,可以保留示范的形状,同时由于两位学习者使用的共同表示,因此适应了新的情况; (iii)ACNMP与现有方法相比,在推断到达任务的推断中可以实现速度效率的RL; (iv)ACNMP可用于在具有不同形态的机器人,具有竞争力的学习速度的机器人之间实施技能转移,并且与最先进的方法相比,假设数量较少。最后,我们通过真正的机器人实验展示了ACNMP的现实性适用性,涉及避免障碍,拾取和倾倒动作。
To equip robots with dexterous skills, an effective approach is to first transfer the desired skill via Learning from Demonstration (LfD), then let the robot improve it by self-exploration via Reinforcement Learning (RL). In this paper, we propose a novel LfD+RL framework, namely Adaptive Conditional Neural Movement Primitives (ACNMP), that allows efficient policy improvement in novel environments and effective skill transfer between different agents. This is achieved through exploiting the latent representation learned by the underlying Conditional Neural Process (CNP) model, and simultaneous training of the model with supervised learning (SL) for acquiring the demonstrated trajectories and via RL for new trajectory discovery. Through simulation experiments, we show that (i) ACNMP enables the system to extrapolate to situations where pure LfD fails; (ii) Simultaneous training of the system through SL and RL preserves the shape of demonstrations while adapting to novel situations due to the shared representations used by both learners; (iii) ACNMP enables order-of-magnitude sample-efficient RL in extrapolation of reaching tasks compared to the existing approaches; (iv) ACNMPs can be used to implement skill transfer between robots having different morphology, with competitive learning speeds and importantly with less number of assumptions compared to the state-of-the-art approaches. Finally, we show the real-world suitability of ACNMPs through real robot experiments that involve obstacle avoidance, pick and place and pouring actions.