论文标题
tempura:用于增量数据处理的一般基础优化器框架(扩展版)
Tempura: A General Cost Based Optimizer Framework for Incremental Data Processing (Extended Version)
论文作者
论文摘要
从增量视图维护,流计算到最近新兴的渐进数据仓库和间歇性查询处理,增量处理在许多应用程序中都广泛使用。尽管在此主题上开发了许多算法,但由于最佳计划取决于数据,但它们都无法制定始终达到最佳性能的增量计划。在本文中,我们开发了一个新型的基于成本的优化器框架,称为Tempura,用于优化增量数据处理。我们根据时变关系的概念提出了一个名为TIP的增量查询计划模型,该模型可以以最通用的形式正式建模增量处理。我们提供了Tempura的完整规范,该规范不仅可以统一各种现有技术以生成最佳的增量计划,还可以允许开发人员添加其重写规则。我们研究如何探索计划空间并寻找最佳的增量计划。我们在各种增量处理方案中对天妇罗进行彻底的实验评估,以显示其有效性和效率。
Incremental processing is widely-adopted in many applications, ranging from incremental view maintenance, stream computing, to recently emerging progressive data warehouse and intermittent query processing. Despite many algorithms developed on this topic, none of them can produce an incremental plan that always achieves the best performance, since the optimal plan is data dependent. In this paper, we develop a novel cost-based optimizer framework, called Tempura, for optimizing incremental data processing. We propose an incremental query planning model called TIP based on the concept of time-varying relations, which can formally model incremental processing in its most general form. We give a full specification of Tempura, which can not only unify various existing techniques to generate an optimal incremental plan, but also allow the developer to add their rewrite rules. We study how to explore the plan space and search for an optimal incremental plan. We conduct a thorough experimental evaluation of Tempura in various incremental processing scenarios to show its effectiveness and efficiency.