论文标题
2速网络合奏,用于有效分类土地使用/土地覆盖卫星图像芯片
2-speed network ensemble for efficient classification of incremental land-use/land-cover satellite image chips
论文作者
论文摘要
卫星图像数据不断增长为行业和政府带来了基于对非常大数据集的及时分析而做出数据驱动决策的挑战。通常使用的深度学习算法自动分类卫星图像是时间和资源密集型训练。当新图像数据和/或类添加到培训语料库中时,在大数据背景下进行重新培训的成本提出了实际的挑战。认识到需要适应性,准确,可扩展的卫星图像芯片分类方案,在这项研究中,我们提出了一个合奏:i)训练缓慢但高精度视觉变压器; ii)训练低参数卷积神经网络的斋戒。 Vision Transformer模型提供了可扩展,准确的基础模型。高速CNN提供了将新标记的数据纳入分析中的有效方法,而准确性较低。为了模拟增量数据,将非常大的(〜400,000张图像)SO2SAT LCZ42卫星图像芯片数据集分为四个间隔,高速CNN再训练每个间隔和视觉变压器进行了每半间隔训练。该实验设置模仿了随着时间的推移数据量和多样性的增加。对于自动覆盖/土地利用分类的任务,每个数据的集合模型都会增加每个组件模型,最佳准确度为65%,而SO2SAT数据集的保留测试分区。拟议的合奏和交错的培训时间表提供了可扩展且具有成本效益的卫星图像分类方案,该方案被优化以处理大量卫星数据。
The ever-growing volume of satellite imagery data presents a challenge for industry and governments making data-driven decisions based on the timely analysis of very large data sets. Commonly used deep learning algorithms for automatic classification of satellite images are time and resource-intensive to train. The cost of retraining in the context of Big Data presents a practical challenge when new image data and/or classes are added to a training corpus. Recognizing the need for an adaptable, accurate, and scalable satellite image chip classification scheme, in this research we present an ensemble of: i) a slow to train but high accuracy vision transformer; and ii) a fast to train, low-parameter convolutional neural network. The vision transformer model provides a scalable and accurate foundation model. The high-speed CNN provides an efficient means of incorporating newly labelled data into analysis, at the expense of lower accuracy. To simulate incremental data, the very large (~400,000 images) So2Sat LCZ42 satellite image chip dataset is divided into four intervals, with the high-speed CNN retrained every interval and the vision transformer trained every half interval. This experimental setup mimics an increase in data volume and diversity over time. For the task of automated land-cover/land-use classification, the ensemble models for each data increment outperform each of the component models, with best accuracy of 65% against a holdout test partition of the So2Sat dataset. The proposed ensemble and staggered training schedule provide a scalable and cost-effective satellite image classification scheme that is optimized to process very large volumes of satellite data.