论文标题

分布式数据密集型系统的模型和调查

A Model and Survey of Distributed Data-Intensive Systems

论文作者

Margara, Alessandro, Cugola, Gianpaolo, Felicioni, Nicolò, Cilloni, Stefano

论文摘要

数据是当今社会中的宝贵资源,并且以前所未有且不断增长的步伐产生。需要在现代软件平台中引入巨大的挑战。这些挑战从根本上改变了所有研究领域,这些研究领域围绕数据管理和处理,并引入了分布式数据密集型系统,这些系统提供了新的编程模型和实施策略,以处理数据特征,例如其数量,生成的速率,其异质性和分布。每个数据密集型系统都在数据模型,用法假设,同步,处理策略,部署,一致性,容错性,容错性,订购方面带来其特定选择。然而,数据密集型系统面临的问题及其提出的解决方案经常重叠。本文提出了一个统一模型,该模型剖析了数据密集型系统的核心功能,并精确地讨论了替代设计和实施策略,并指出了它们的假设和含义。该模型为理解和比较高度异构解决方案提供了共同的基础,并有可能在研究社区促进交叉利用并推进该领域。我们通过对数十个系统进行分类来应用我们的模型:一种练习,使人们对数据密集型系统领域的当前趋势进行了有趣的观察,并提出了开放研究方向。

Data is a precious resource in today's society, and is generated at an unprecedented and constantly growing pace. The need to store, analyze, and make data promptly available to a multitude of users introduces formidable challenges in modern software platforms. These challenges radically transformed all research fields that gravitate around data management and processing, with the introduction of distributed data-intensive systems that offer new programming models and implementation strategies to handle data characteristics such as its volume, the rate at which it is produced, its heterogeneity, and its distribution. Each data-intensive system brings its specific choices in terms of data model, usage assumptions, synchronization, processing strategy, deployment, guarantees in terms of consistency, fault tolerance, ordering. Yet, the problems data-intensive systems face and the solutions they propose are frequently overlapping. This paper proposes a unifying model that dissects the core functionalities of data-intensive systems, and precisely discusses alternative design and implementation strategies, pointing out their assumptions and implications. The model offers a common ground to understand and compare highly heterogeneous solutions, with the potential of fostering cross-fertilization across research communities and advancing the field. We apply our model by classifying tens of systems: an exercise that brings to interesting observations on the current trends in the domain of data-intensive systems and suggests open research directions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源