工具网：使用常识性概括用于预测机器人计划合成的工具使用

论文标题

工具网：使用常识性概括用于预测机器人计划合成的工具使用

ToolNet: Using Commonsense Generalization for Predicting Tool Use for Robot Plan Synthesis

论文作者

Bansal, Rajas, Tuli, Shreshth, Paul, Rohan, Mausam

论文摘要

在物理环境中（例如家庭或工厂）工作的机器人需要学习使用各种可用工具来完成不同的任务，例如，用于清洁的拖把和用于携带对象的托盘。可能的工具数量很大，在培训过程中证明每个单独工具的使用可能是不可行的。机器人可以学习常识知识并适应一些已知工具缺失的新颖设置，但是存在其他看不见的工具吗？我们提出了一个神经模型，该模型可预测可用对象的最佳工具，以实现给定的声明性目标。该模型由用户演示培训，我们通过人类在物理模拟器中的机器人指导机器人进行众包。该数据集维护涉及多步骤对象交互以及符号状态更改的用户计划。我们的神经模型工具网将图形神经网络组合在一起，以编码当前的环境状态和目标条件的空间注意，以预测适当的工具。我们发现，提供对象的度量和语义属性，以及从常识性知识存储库中得出的预训练的对象嵌入，例如概念网络，显着提高了该模型的推广能力，可以推广到看不见的工具。该模型可以进行准确且可推广的工具预测。与图形神经网络基线相比，它可以提高14-27％的准确性，以预测新世界场景的已知工具，而在训练过程中未遇到的新物体的概括提高了44-67％。

A robot working in a physical environment (like home or factory) needs to learn to use various available tools for accomplishing different tasks, for instance, a mop for cleaning and a tray for carrying objects. The number of possible tools is large and it may not be feasible to demonstrate usage of each individual tool during training. Can a robot learn commonsense knowledge and adapt to novel settings where some known tools are missing, but alternative unseen tools are present? We present a neural model that predicts the best tool from the available objects for achieving a given declarative goal. This model is trained by user demonstrations, which we crowd-source through humans instructing a robot in a physics simulator. This dataset maintains user plans involving multi-step object interactions along with symbolic state changes. Our neural model, ToolNet, combines a graph neural network to encode the current environment state, and goal-conditioned spatial attention to predict the appropriate tool. We find that providing metric and semantic properties of objects, and pre-trained object embeddings derived from a commonsense knowledge repository such as ConceptNet, significantly improves the model's ability to generalize to unseen tools. The model makes accurate and generalizable tool predictions. When compared to a graph neural network baseline, it achieves 14-27% accuracy improvement for predicting known tools from new world scenes, and 44-67% improvement in generalization for novel objects not encountered during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题