论文标题
MOR-UAV:无人机视频中用于移动对象识别的基准数据集和基准
MOR-UAV: A Benchmark Dataset and Baselines for Moving Object Recognition in UAV Videos
论文作者
论文摘要
从无人驾驶汽车(UAV)收集的视觉数据已经打开了计算机视觉的新边界,需要对航空图像/视频进行自动分析。但是,现有的无人机数据集主要关注对象检测。对象检测器不会区分移动和非移动对象。给定实时无人机视频流,我们如何既可以将移动对象进行本地化和分类,即执行移动对象识别(MOR)? MOR是支持各种基于无人机视觉的应用程序的重要任务之一,包括空中监视,搜索和救援,事件识别,城市和农村场景的理解。在我们的最佳知识中,在无人机视频中没有标记的数据集可用于MOR评估。因此,在本文中,我们介绍了MOR-UAV,这是一个用于航空视频中MOR的大型视频数据集。我们通过标记轴对准对象的轴对准框来实现这一目标,这比产生像素级估计值所需的计算资源要少。我们注释了从30个无人机视频收集的89,783个移动对象实例,其中包括10,948帧在各种情况下,例如天气条件,遮挡,更改飞行高度和多个相机视图。我们为两类车辆(汽车和重型车辆)分配了标签。此外,我们在无人机视频中为MOR提出了一个深层的统一框架Mor-Uavnet。由于这是在无人机视频中进行MOR的首次尝试,因此我们通过定量和定性实验根据MOR-UAV数据集的拟议框架提出了16个基线结果。我们还通过多层可视化分析了网络中的运动空间区域。 MOR-UAAVNET在推理上在线工作,因为它仅需要过去的框架。此外,它不需要用户的预定目标初始化。实验还表明,MOR-UAV数据集非常具有挑战性。
Visual data collected from Unmanned Aerial Vehicles (UAVs) has opened a new frontier of computer vision that requires automated analysis of aerial images/videos. However, the existing UAV datasets primarily focus on object detection. An object detector does not differentiate between the moving and non-moving objects. Given a real-time UAV video stream, how can we both localize and classify the moving objects, i.e. perform moving object recognition (MOR)? The MOR is one of the essential tasks to support various UAV vision-based applications including aerial surveillance, search and rescue, event recognition, urban and rural scene understanding.To the best of our knowledge, no labeled dataset is available for MOR evaluation in UAV videos. Therefore, in this paper, we introduce MOR-UAV, a large-scale video dataset for MOR in aerial videos. We achieve this by labeling axis-aligned bounding boxes for moving objects which requires less computational resources than producing pixel-level estimates. We annotate 89,783 moving object instances collected from 30 UAV videos, consisting of 10,948 frames in various scenarios such as weather conditions, occlusion, changing flying altitude and multiple camera views. We assigned the labels for two categories of vehicles (car and heavy vehicle). Furthermore, we propose a deep unified framework MOR-UAVNet for MOR in UAV videos. Since, this is a first attempt for MOR in UAV videos, we present 16 baseline results based on the proposed framework over the MOR-UAV dataset through quantitative and qualitative experiments. We also analyze the motion-salient regions in the network through multiple layer visualizations. The MOR-UAVNet works online at inference as it requires only few past frames. Moreover, it doesn't require predefined target initialization from user. Experiments also demonstrate that the MOR-UAV dataset is quite challenging.