论文标题
学习通过使用移植网络来利用多种视力方式
Learning to Exploit Multiple Vision Modalities by Using Grafted Networks
论文作者
论文摘要
新型视觉传感器(例如热,高光谱,极化和事件摄像机)提供的信息提供了传统强度摄像机无法获得的信息。使用这些传感器与当前强大的深神经网络使用这些传感器的障碍是缺乏大型标记的培训数据集。本文提出了一种网络嫁接算法(NGA),其中由非常规的视觉输入驱动的新的前端网络取代了处理强度帧的预处理深网的前端网络。自我监督的训练仅使用同步录制的强度帧和新型传感器数据,以最大程度地提高预验证的网络和移植网络之间的特征相似性。我们表明,增强的移植网络可以使用热和事件摄像机数据集达到对象检测任务上验证的网络的竞争平均精度(AP50),而推理成本没有增加。特别是,由热帧驱动的移植网络在使用强度帧的情况下显示出49.11%的相对改善。移植的前端只有总参数的5--8%,并且可以在几个小时内的单个GPU进行训练,相当于从标记的数据训练整个对象检测器所需的时间的5%。 NGA允许新的视觉传感器利用先前预计的强大的深层模型,节省训练成本并扩大新型传感器的应用。
Novel vision sensors such as thermal, hyperspectral, polarization, and event cameras provide information that is not available from conventional intensity cameras. An obstacle to using these sensors with current powerful deep neural networks is the lack of large labeled training datasets. This paper proposes a Network Grafting Algorithm (NGA), where a new front end network driven by unconventional visual inputs replaces the front end network of a pretrained deep network that processes intensity frames. The self-supervised training uses only synchronously-recorded intensity frames and novel sensor data to maximize feature similarity between the pretrained network and the grafted network. We show that the enhanced grafted network reaches competitive average precision (AP50) scores to the pretrained network on an object detection task using thermal and event camera datasets, with no increase in inference costs. Particularly, the grafted network driven by thermal frames showed a relative improvement of 49.11% over the use of intensity frames. The grafted front end has only 5--8% of the total parameters and can be trained in a few hours on a single GPU equivalent to 5% of the time that would be needed to train the entire object detector from labeled data. NGA allows new vision sensors to capitalize on previously pretrained powerful deep models, saving on training cost and widening a range of applications for novel sensors.