内存 - 分散的内存中对象存储框架，用于大数据应用程序

论文标题

内存 - 分散的内存中对象存储框架，用于大数据应用程序

Memory-Disaggregated In-Memory Object Store Framework for Big Data Applications

论文作者

Abrahamse, Robin, Hadnagy, Akos, Al-Ars, Zaid

论文摘要

记忆分解的概念最近在研究中引起了人们的关注。通过内存分解，数据中心计算节点可以直接访问相邻节点上的内存，因此能够克服本地内存限制，从而引入了用于分布式计算的新数据管理范式。本文提出并通过利用新引入的Thymess Flow Memory分解系统来提出并演示了用于大数据应用程序的内存中对象存储框架。该框架通过使客户能够轻松有效地生成和消耗多个计算节点的数据对象，将预先存在的Apache Arrow等离子体对象存储框架的功能扩展到分布式系统。这允许大数据应用程序越来越多地利用降低的开发成本以并行处理。此外，该论文还包括延迟和吞吐量测量值，这些延迟和吞吐量测量仅表示远程分解的内存访问而不是本地的远程分解内存访问（〜6.5 vs〜5.75 gib/s）。结果可用于指导未来的系统的设计，这些系统利用内存分解以及新呈现的框架。这项工作是开源的，可以在https://doi.org/10.5281/zenodo.6368998上公开访问。

The concept of memory disaggregation has recently been gaining traction in research. With memory disaggregation, data center compute nodes can directly access memory on adjacent nodes and are therefore able to overcome local memory restrictions, introducing a new data management paradigm for distributed computing. This paper proposes and demonstrates a memory disaggregated in-memory object store framework for big data applications by leveraging the newly introduced ThymesisFlow memory disaggregation system. The framework extends the functionality of the pre-existing Apache Arrow Plasma object store framework to distributed systems by enabling clients to easily and efficiently produce and consume data objects across multiple compute nodes. This allows big data applications to increasingly leverage parallel processing at reduced development costs. In addition, the paper includes latency and throughput measurements that indicate only a modest performance penalty is incurred for remote disaggregated memory access as opposed to local (~6.5 vs ~5.75 GiB/s). The results can be used to guide the design of future systems that leverage memory disaggregation as well as the newly presented framework. This work is open-source and publicly accessible at https://doi.org/10.5281/zenodo.6368998.

下载PDF全文

下载文献需遵守相关版权规定

论文标题