论文标题
工具网:使用常识性概括用于预测机器人计划合成的工具使用
ToolNet: Using Commonsense Generalization for Predicting Tool Use for Robot Plan Synthesis
论文作者
论文摘要
在物理环境中(例如家庭或工厂)工作的机器人需要学习使用各种可用工具来完成不同的任务,例如,用于清洁的拖把和用于携带对象的托盘。可能的工具数量很大,在培训过程中证明每个单独工具的使用可能是不可行的。机器人可以学习常识知识并适应一些已知工具缺失的新颖设置,但是存在其他看不见的工具吗?我们提出了一个神经模型,该模型可预测可用对象的最佳工具,以实现给定的声明性目标。该模型由用户演示培训,我们通过人类在物理模拟器中的机器人指导机器人进行众包。该数据集维护涉及多步骤对象交互以及符号状态更改的用户计划。我们的神经模型工具网将图形神经网络组合在一起,以编码当前的环境状态和目标条件的空间注意,以预测适当的工具。我们发现,提供对象的度量和语义属性,以及从常识性知识存储库中得出的预训练的对象嵌入,例如概念网络,显着提高了该模型的推广能力,可以推广到看不见的工具。该模型可以进行准确且可推广的工具预测。与图形神经网络基线相比,它可以提高14-27%的准确性,以预测新世界场景的已知工具,而在训练过程中未遇到的新物体的概括提高了44-67%。
A robot working in a physical environment (like home or factory) needs to learn to use various available tools for accomplishing different tasks, for instance, a mop for cleaning and a tray for carrying objects. The number of possible tools is large and it may not be feasible to demonstrate usage of each individual tool during training. Can a robot learn commonsense knowledge and adapt to novel settings where some known tools are missing, but alternative unseen tools are present? We present a neural model that predicts the best tool from the available objects for achieving a given declarative goal. This model is trained by user demonstrations, which we crowd-source through humans instructing a robot in a physics simulator. This dataset maintains user plans involving multi-step object interactions along with symbolic state changes. Our neural model, ToolNet, combines a graph neural network to encode the current environment state, and goal-conditioned spatial attention to predict the appropriate tool. We find that providing metric and semantic properties of objects, and pre-trained object embeddings derived from a commonsense knowledge repository such as ConceptNet, significantly improves the model's ability to generalize to unseen tools. The model makes accurate and generalizable tool predictions. When compared to a graph neural network baseline, it achieves 14-27% accuracy improvement for predicting known tools from new world scenes, and 44-67% improvement in generalization for novel objects not encountered during training.