基于变压器的特征分割和用于无人机地理位置的区域对齐方法

论文标题

基于变压器的特征分割和用于无人机地理位置的区域对齐方法

A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

论文作者

Dai, Ming, Hu, Jianhong, Zhuang, Jiedong, Zheng, Enhui

论文摘要

跨视图地理位置定位是从不同视图，例如无人机（UAV）和卫星匹配相同地理图像的任务。最困难的挑战是位置转移以及距离和规模的不确定性。现有方法主要旨在挖掘更全面的细粒度信息。但是，它低估了提取鲁棒特征表示和特征对齐的影响的重要性。基于CNN的方法在跨视图地理位置定位方面取得了巨大成功。但是，它仍然存在一些局限性，例如，它只能提取社区中的一部分信息，并且一些规模降低操作将丢失一些细粒度的信息。特别是，我们引入了一种简单有效的基于变压器的结构，称为特征分割和区域对齐（FSRA），以增强模型了解上下文信息的能力以及了解实例的分布。在不使用其他监督信息的情况下，FSRA会根据变压器特征图的热分布对区域进行划分，然后将不同视图中的多个特定区域对齐。最后，FSRA将每个区域集成到一组特征表示中。不同之处在于，FSRA不会手动划分区域，而是根据特征图的热分布自动自动分配区域。因此，当图像发生重大变化和比例变化时，特定实例仍然可以分配和对齐。此外，提出了多个抽样策略来克服卫星图像数量和其他来源图像的差异。实验表明，所提出的方法具有卓越的性能，并在无人机视图目标定位和无人机导航的两项任务中都达到了最新。代码将在https://github.com/dmmm1997/fsra上发布

Cross-view geo-localization is a task of matching the same geographic image from different views, e.g., unmanned aerial vehicle (UAV) and satellite. The most difficult challenges are the position shift and the uncertainty of distance and scale. Existing methods are mainly aimed at digging for more comprehensive fine-grained information. However, it underestimates the importance of extracting robust feature representation and the impact of feature alignment. The CNN-based methods have achieved great success in cross-view geo-localization. However it still has some limitations, e.g., it can only extract part of the information in the neighborhood and some scale reduction operations will make some fine-grained information lost. In particular, we introduce a simple and efficient transformer-based structure called Feature Segmentation and Region Alignment (FSRA) to enhance the model's ability to understand contextual information as well as to understand the distribution of instances. Without using additional supervisory information, FSRA divides regions based on the heat distribution of the transformer's feature map, and then aligns multiple specific regions in different views one on one. Finally, FSRA integrates each region into a set of feature representations. The difference is that FSRA does not divide regions manually, but automatically based on the heat distribution of the feature map. So that specific instances can still be divided and aligned when there are significant shifts and scale changes in the image. In addition, a multiple sampling strategy is proposed to overcome the disparity in the number of satellite images and that of images from other sources. Experiments show that the proposed method has superior performance and achieves the state-of-the-art in both tasks of drone view target localization and drone navigation. Code will be released at https://github.com/Dmmm1997/FSRA

下载PDF全文

下载文献需遵守相关版权规定

论文标题