活力：跨视图图像地理位置超越一对一检索

论文标题

活力：跨视图图像地理位置超越一对一检索

VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval

论文作者

Zhu, Sijie, Yang, Taojiannan, Chen, Chen

论文摘要

跨视图图像地理位置定位旨在通过从空光视图中与GPS标记的参考图像匹配来确定街道视觉查询图像的位置。最近的作品在城市规模的数据集上取得了令人惊讶的高检索准确性。但是，这些结果依赖于以下假设：在任何查询映像的位置中，都有一个准确居中的参考图像，该图像不适用于实际情况。在本文中，我们以更现实的假设重新定义了这个问题，即查询图像可以在感兴趣的领域任意是任意的，并且在查询出现之前将捕获参考图像。此假设破坏了现有数据集的一对一检索设置，因为查询和参考图像不是完全对齐的对，并且可能有多个参考图像涵盖一个查询位置。为了弥合这种现实设置和现有数据集之间的差距，我们为跨视图图像地理位置定位的新大规模基准（Vigor）提出了一个新的大规模基准。我们基准了现有的最新方法，并提出了一个新颖的端到端框架，以粗到精细的方式定位查询。除了图像级检索精度外，我们还使用RAW GPS数据来评估实际距离（米）的定位精度。在不同的应用方案下进行了广泛的实验，以验证所提出的方法的有效性。结果表明，在这种现实环境中的跨视图地理位置定位仍然具有挑战性，从而促进了这一方向的新研究。我们的数据集和代码将在\ url {https://github.com/jeff-zilence/vigor}发布。

Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view. Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets. However, these results rely on the assumption that there exists a reference image exactly centered at the location of any query image, which is not applicable for practical scenarios. In this paper, we redefine this problem with a more realistic assumption that the query image can be arbitrary in the area of interest and the reference images are captured before the queries emerge. This assumption breaks the one-to-one retrieval setting of existing datasets as the queries and reference images are not perfectly aligned pairs, and there may be multiple reference images covering one query location. To bridge the gap between this realistic setting and existing datasets, we propose a new large-scale benchmark -- VIGOR -- for cross-View Image Geo-localization beyond One-to-one Retrieval. We benchmark existing state-of-the-art methods and propose a novel end-to-end framework to localize the query in a coarse-to-fine manner. Apart from the image-level retrieval accuracy, we also evaluate the localization accuracy in terms of the actual distance (meters) using the raw GPS data. Extensive experiments are conducted under different application scenarios to validate the effectiveness of the proposed method. The results indicate that cross-view geo-localization in this realistic setting is still challenging, fostering new research in this direction. Our dataset and code will be released at \url{https://github.com/Jeff-Zilence/VIGOR}

下载PDF全文

下载文献需遵守相关版权规定

论文标题