跨任务转移用于地理位置的视听场景识别

论文标题

跨任务转移用于地理位置的视听场景识别

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

论文作者

Hu, Di, Li, Xuhong, Mou, Lichao, Jin, Pu, Chen, Dong, Jing, Liping, Zhu, Xiaoxiang, Dou, Dejing

论文摘要

空中识别是遥感中的一项基本任务，最近收到了增加的兴趣。 While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input.基于观察到，在给定的地理位置上更有可能听到某些特定的声音事件，我们建议从声音事件中利用知识来提高空中场景识别的性能。为此，我们构建了一个名为Audio Visual Visual Aertinial Scenes识别数据集（Advance）的新数据集。在此数据集的帮助下，我们评估了在多模式学习框架中将声音事件知识转移到空中场景识别任务的三种建议方法，并显示出利用音频信息以获得空中场景识别的好处。源代码可公开用于可重复性。

Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题