论文标题

部分可观测时空混沌系统的无模型预测

Colocating Real-time Storage and Processing: An Analysis of Pull-based versus Push-based Streaming

论文作者

Marcu, Ovidiu-Cristian, Bouvry, Pascal

论文摘要

实时大数据体系结构演变成专门的层,用于处理过去十年来数据流的摄入,存储和处理。分层流体系结合基于拉的读取和推动的写入RPC机制,该机制由流摄取/存储系统实现。此外,流处理发动机暴露了源/接收器界面,使它们可以轻松地将这些系统解散。但是,开源流媒体发动机利用通过基于拉动的方法实现的工作流源,不断向流摄取/存储读取RPC,有效地与Write RPC竞争。本文提出了一个统一的流式体系结构,该体系结构利用基于推动的和/或基于拉的源实现来集成摄入/存储和加工引擎,以减少处理潜伏期并增加系统读取和写入吞吐量,同时为更高的摄入提供空间。我们通过用一个单个RPC和共享内存替换基于连续的基于拉的RPC(存储和处理通过指针来共享对象的存储和处理流数据)来实现一种新颖的基于推动的流源。为此,我们对流源读取器的基于拉的基于拉的设计替代方案进行了实验分析,同时考虑了一组流基准和微实验分析,并讨论了这两种方法的优势。

Real-time Big Data architectures evolved into specialized layers for handling data streams' ingestion, storage, and processing over the past decade. Layered streaming architectures integrate pull-based read and push-based write RPC mechanisms implemented by stream ingestion/storage systems. In addition, stream processing engines expose source/sink interfaces, allowing them to decouple these systems easily. However, open-source streaming engines leverage workflow sources implemented through a pull-based approach, continuously issuing read RPCs towards the stream ingestion/storage, effectively competing with write RPCs. This paper proposes a unified streaming architecture that leverages push-based and/or pull-based source implementations for integrating ingestion/storage and processing engines that can reduce processing latency and increase system read and write throughput while making room for higher ingestion. We implement a novel push-based streaming source by replacing continuous pull-based RPCs with one single RPC and shared memory (storage and processing handle streaming data through pointers to shared objects). To this end, we conduct an experimental analysis of pull-based versus push-based design alternatives of the streaming source reader while considering a set of stream benchmarks and microbenchmarks and discuss the advantages of both approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源