论文标题
Scale-Mae:用于多尺度地理空间表示学习的掩盖自动编码器学习
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
论文作者
论文摘要
大型,预告片的模型通常用图像进行填充,这些图像大大增强到模仿不同的条件和尺度上,所得模型用于各种任务,这些任务来自一系列空间尺度的图像。这样的模型忽略了数据依赖比例依赖域(例如遥感域)中的规模特定信息。在本文中,我们提出了一种规模,这是一种预处理的方法,可以在整个训练过程中明确学习在不同的已知尺度上的数据之间的关系。 Scale-Mae通过在已知的输入尺度掩盖输入图像来预测网络,其中图像覆盖的地球面积决定了VIT位置编码的尺度,而不是图像分辨率。 Scale-MAE用标准的VIT主链编码蒙版的图像,然后通过带通滤波器解码掩盖的图像,以在较低/更高尺度上重建低/高频图像。我们发现,通过重建低/高频图像的重建网络将导致遥感图像的强大多尺度表示形式。与当前的最新时间相比,Scale-Mae平均达到$ 2.4-5.6 \%$ $非参数KNN分类改进,并获得了$ 0.9 $ MIOU至$ 1.7 $ MIOU的改进,以改进SpaceNet构建分割转移任务,以范围范围的评估量表。
Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a $2.4 - 5.6\%$ non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a $0.9$ mIoU to $1.7$ mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.