剪辑场：机器人记忆的弱监督语义领域

论文标题

剪辑场：机器人记忆的弱监督语义领域

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

论文作者

Shafiullah, Nur Muhammad Mahi, Paxton, Chris, Pinto, Lerrel, Chintala, Soumith, Szlam, Arthur

论文摘要

我们提出了夹板，这是一种隐式场景模型，可用于各种任务，例如分割，实例识别，对空间的语义搜索和查看本地化。夹板学习从空间位置到语义嵌入向量的映射。重要的是，我们表明该映射可以接受仅来自Web图像和Web文本训练的模型（例如剪辑，诊断和句子）的监督；因此，不使用直接的人类监督。与mask-rcNN这样的基准相比，我们的方法在HM3D数据集上的几个shot实例识别或语义分割上的表现仅优于示例中的一小部分。最后，我们表明，使用夹板作为场景内存，机器人可以在现实世界环境中执行语义导航。我们的代码和演示视频可在此处提供：https：//mahis.life/clip-fields

We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. CLIP-Fields learns a mapping from spatial locations to semantic embedding vectors. Importantly, we show that this mapping can be trained with supervision coming only from web-image and web-text trained models such as CLIP, Detic, and Sentence-BERT; and thus uses no direct human supervision. When compared to baselines like Mask-RCNN, our method outperforms on few-shot instance identification or semantic segmentation on the HM3D dataset with only a fraction of the examples. Finally, we show that using CLIP-Fields as a scene memory, robots can perform semantic navigation in real-world environments. Our code and demonstration videos are available here: https://mahis.life/clip-fields

下载PDF全文

下载文献需遵守相关版权规定

论文标题