论文标题
Mix-Rs:基于HDFS的多指数系统用于遥感数据存储
MIX-RS: A Multi-indexing System based on HDFS for Remote Sensing Data Storage
论文作者
论文摘要
通过卫星技术的部署,已经生成了大量的遥感(RS)数据。数据促进了生态监测,土地管理和荒漠化等方面的研究。RS数据的特征(例如,庞大的体积,较大的单文件尺寸和对故障耐受性的苛刻要求)使Hadoop分布式文件系统(HDFS)成为RS数据存储的理想选择,可以进行有效的,具有可扩展性,具有数据复制机制,以使其具有可扩展性。要使用RS数据,最重要的技术之一是地理空间索引。但是,较大的数据量使其耗时有效地构建和杠杆作用。考虑到大多数现代地理空间数据中心都配备了基于HDFS的大数据处理基础架构,因此部署多个地理空间指数变得自然可以优化疗效。此外,由于高质量硬件和RS数据的不经过修改的属性所介绍的可靠性,多索引的使用不会引起大开销。因此,我们设计了一个称为“多数指数RS(Mix-rs”)的框架,该框架将HDF顶部的多指数机制统一了,并启用了数据复制,以实现错误的容错和地理空间索引效率。鉴于HDFS提供的容错,RS数据在结构上存储在内部,以更快地进行地理空间索引。另外,多指数提高了效率。提出的技术自然坐落在HDF之上,以形成整体框架,而不会产生严重的开销或复杂的系统实施工作。 Mix-RS框架是使用中国科学院提供的实际遥感数据实施和评估的,表明了出色的地理空间索引性能。
A large volume of remote sensing (RS) data has been generated with the deployment of satellite technologies. The data facilitates research in ecological monitoring, land management and desertification, etc. The characteristics of RS data (e.g., enormous volume, large single-file size and demanding requirement of fault tolerance) make the Hadoop Distributed File System (HDFS) an ideal choice for RS data storage as it is efficient, scalable and equipped with a data replication mechanism for failure resilience. To use RS data, one of the most important techniques is geospatial indexing. However, the large data volume makes it time-consuming to efficiently construct and leverage. Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures, deploying multiple geospatial indices becomes natural to optimise the efficacy. Moreover, because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data, the use of multi-indexing will not cause large overhead. Therefore, we design a framework called Multi-IndeXing-RS (MIX-RS) that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency. Given the fault tolerance provided by the HDFS, RS data is structurally stored inside for faster geospatial indexing. Additionally, multi-indexing enhances efficiency. The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts. The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences, demonstrating excellent geospatial indexing performance.