论文标题
使用深度学习对低计算人类机器人的实时对象检测和识别
Real-Time Object Detection and Recognition on Low-Compute Humanoid Robots using Deep Learning
论文作者
论文摘要
我们设想,在不久的将来,类人类机器人将通过对象操作共享家庭空间,并协助我们进行日常和常规活动。需要为机器人开发的基本技术之一是使它们能够检测对象并识别它们以进行有效的操作并做出涉及这些对象的实时决策。在本文中,我们描述了一种新颖的体系结构,该架构使多个低计算的NAO机器人能够在其相机视图中对对象进行实时检测,识别和定位,并根据检测到的对象采取可编程操作。基于多种情况下的室内实验,提出的用于对象检测和定位的算法是对Yolov3的经验修改,重量大小较小,计算要求较小。重量和重新调整过滤器的大小和卷量的层排列的量化改善了从机器人的摄像头进料中的低分辨率图像的推理时间。在对边界盒算法进行比较研究之后,选择了Yolov3,以选择一个目标,以达到信息保留,低推理时间和实时对象检测和本地化的高精度之间的完美平衡。该体系结构还包括一条有效的端到端管道,该管道将相机馈电的实时帧馈送到神经网络,并使用其结果将机器人引导使用与检测到的类标签相对应的可自定义操作。
We envision that in the near future, humanoid robots would share home space and assist us in our daily and routine activities through object manipulations. One of the fundamental technologies that need to be developed for robots is to enable them to detect objects and recognize them for effective manipulations and take real-time decisions involving those objects. In this paper, we describe a novel architecture that enables multiple low-compute NAO robots to perform real-time detection, recognition and localization of objects in its camera view and take programmable actions based on the detected objects. The proposed algorithm for object detection and localization is an empirical modification of YOLOv3, based on indoor experiments in multiple scenarios, with a smaller weight size and lesser computational requirements. Quantization of the weights and re-adjusting filter sizes and layer arrangements for convolutions improved the inference time for low-resolution images from the robot s camera feed. YOLOv3 was chosen after a comparative study of bounding box algorithms was performed with an objective to choose one that strikes the perfect balance among information retention, low inference time and high accuracy for real-time object detection and localization. The architecture also comprises of an effective end-to-end pipeline to feed the real-time frames from the camera feed to the neural net and use its results for guiding the robot with customizable actions corresponding to the detected class labels.