对象永久性通过视听表示

论文标题

对象永久性通过视听表示

Object Permanence Through Audio-Visual Representations

论文作者

Bu, Fanjun, Huang, Chien-Ming

论文摘要

当机器人执行操纵任务并与对象交互时，可能会意外地掉落对象（例如，由于对不熟悉对象的掌握不足）随后从视野中反弹出来。为了使机器人能够从此类错误中恢复，我们借鉴了对象永久性对象的概念，即使直接感知（例如，看到），它们仍然存在。特别是，我们开发了一种多模式的神经网络模型，使用部分，观察到的弹跳轨迹和由于跌落影响而产生的音频，因为其输入可以预测完整的弹跳轨迹和掉落对象的最终位置。我们从经验上表明：1）我们的多模式方法预测近端的最终位置（即在机器人手腕相机的视野内）到实际位置； 2）机器人能够通过应用最小的基于视觉的拾取调整来检索机器人的掉落对象。此外，我们表明我们的方法在检索掉落的对象时表现优于五个比较基线。我们的结果有助于为机器人启用对象永久性，并从对象下降中恢复错误。

As robots perform manipulation tasks and interact with objects, it is probable that they accidentally drop objects (e.g., due to an inadequate grasp of an unfamiliar object) that subsequently bounce out of their visual fields. To enable robots to recover from such errors, we draw upon the concept of object permanence-objects remain in existence even when they are not being sensed (e.g., seen) directly. In particular, we developed a multimodal neural network model-using a partial, observed bounce trajectory and the audio resulting from drop impact as its inputs-to predict the full bounce trajectory and the end location of a dropped object. We empirically show that: 1) our multimodal method predicted end locations close in proximity (i.e., within the visual field of the robot's wrist camera) to the actual locations and 2) the robot was able to retrieve dropped objects by applying minimal vision-based pick-up adjustments. Additionally, we show that our method outperformed five comparison baselines in retrieving dropped objects. Our results contribute to enabling object permanence for robots and error recovery from object drops.

下载PDF全文

下载文献需遵守相关版权规定

论文标题