论文标题

剪辑场:机器人记忆的弱监督语义领域

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

论文作者

Shafiullah, Nur Muhammad Mahi, Paxton, Chris, Pinto, Lerrel, Chintala, Soumith, Szlam, Arthur

论文摘要

我们提出了夹板,这是一种隐式场景模型,可用于各种任务,例如分割,实例识别,对空间的语义搜索和查看本地化。夹板学习从空间位置到语义嵌入向量的映射。重要的是,我们表明该映射可以接受仅来自Web图像和Web文本训练的模型(例如剪辑,诊断和句子)的监督;因此,不使用直接的人类监督。与mask-rcNN这样的基准相比,我们的方法在HM3D数据集上的几个shot实例识别或语义分割上的表现仅优于示例中的一小部分。最后,我们表明,使用夹板作为场景内存,机器人可以在现实世界环境中执行语义导航。我们的代码和演示视频可在此处提供:https://mahis.life/clip-fields

We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. CLIP-Fields learns a mapping from spatial locations to semantic embedding vectors. Importantly, we show that this mapping can be trained with supervision coming only from web-image and web-text trained models such as CLIP, Detic, and Sentence-BERT; and thus uses no direct human supervision. When compared to baselines like Mask-RCNN, our method outperforms on few-shot instance identification or semantic segmentation on the HM3D dataset with only a fraction of the examples. Finally, we show that using CLIP-Fields as a scene memory, robots can perform semantic navigation in real-world environments. Our code and demonstration videos are available here: https://mahis.life/clip-fields

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源