论文标题

Atrapos:Metapath查询工作负载的实时评估

Atrapos: Real-time Evaluation of Metapath Query Workloads

论文作者

Chatzopoulos, Serafeim, Vergoulis, Thanasis, Skoutas, Dimitrios, Dalamagas, Theodore, Tryfonopoulos, Christos, Karras, Panagiotis

论文摘要

异构信息网络(HINS)代表不同类型的实体及其之间的关系。探索,分析和从此类网络中提取知识取决于识别通过不同语义关系相关的实体的Metapath查询。虽然对大型网络尺度呼吸器上的Metapath查询工作负载的实时评估在计算成本上的要求高度要求,但当前的方法并未利用查询之间的相互关系。在本文中,我们提出了Atrapos,这是一种实时评估Metapath查询工作负载的新方法,该方法利用了有效的稀疏矩阵乘法和中间结果缓存的组合。 Atrapos通过使用裁缝制造的数据结构,重叠树和相关的缓存策略来实时检测工作负载查询之间的频繁子 - 密码来选择中间结果以缓存和重复使用。我们对实际数据的实验研究表明,在所有检查的情况下,Atrapos加速了探索性数据分析和挖掘,超过了现成的缓存方法和最先进的研究原型。 - 请注意,我们的作品版本比TheWebConf 2023中提出的作品更扩展(doi:10.1145/3543507.3583322)

Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present ATRAPOS, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. ATRAPOS selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that ATRAPOS accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios. -- Note that this version of our work is more extended than the one presented in TheWebConf 2023 (doi: 10.1145/3543507.3583322)

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源