论文标题
通过观看YouTube视频来进行语义视频导航
Semantic Visual Navigation by Watching YouTube Videos
论文作者
论文摘要
现实世界环境布局中的语义提示和统计规律可以提高新型环境中导航的效率。本文通过观看YouTube视频,学习并利用了这种语义提示,以导航到新颖环境中感兴趣的对象。这是具有挑战性的,因为YouTube视频不带有行动或目标的标签,甚至可能不会展示最佳行为。我们的方法通过在伪标记的过渡四边形上使用Q学习来应对这些挑战(图像,动作,下一个图像,奖励)。我们表明,从被动数据中的这种非政策Q学习能够学习有意义的语义提示进行导航。这些提示在层次导航策略中使用时,在视觉上逼真的模拟中可提高对象目标任务的效率。在使用最小的直接相互作用的同时,我们观察到在端到端RL,行为克隆和经典方法上相对提高15-83%。
Semantic cues and statistical regularities in real-world environment layouts can improve efficiency for navigation in novel environments. This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos. This is challenging because YouTube videos don't come with labels for actions or goals, and may not even showcase optimal behavior. Our method tackles these challenges through the use of Q-learning on pseudo-labeled transition quadruples (image, action, next image, reward). We show that such off-policy Q-learning from passive data is able to learn meaningful semantic cues for navigation. These cues, when used in a hierarchical navigation policy, lead to improved efficiency at the ObjectGoal task in visually realistic simulations. We observe a relative improvement of 15-83% over end-to-end RL, behavior cloning, and classical methods, while using minimal direct interaction.