论文标题
TED:旨在在图数据库中发现顶级边缘变化模式
TED: Towards Discovering Top-k Edge-Diversified Patterns in a Graph Database
论文作者
论文摘要
由于来自不同存储库的图形数量呈指数增长,因此非常需要分析包含大量中小型数据图(例如化合物)的图形数据库。尽管已经提出了子图枚举和子图挖掘,以通过一组子图结构将洞察力带入图形数据库中,但它们通常最终得到相似或同质的拓扑,这在许多图应用程序中是不受欢迎的。为了解决此限制,我们提出了TOP-K边缘变化模式发现问题,以检索涵盖数据库中最大边缘数量的一组子图。为了有效地处理此类查询,我们提出了一个称为TED的通用且可扩展的框架,该框架与最佳结果相近似。进一步制定了两种优化策略以提高性能。关于现实世界数据集的实验研究证明了TED对传统技术的优越性。
With an exponentially growing number of graphs from disparate repositories, there is a strong need to analyze a graph database containing an extensive collection of small- or medium-sized data graphs (e.g., chemical compounds). Although subgraph enumeration and subgraph mining have been proposed to bring insights into a graph database by a set of subgraph structures, they often end up with similar or homogenous topologies, which is undesirable in many graph applications. To address this limitation, we propose the Top-k Edge-Diversified Patterns Discovery problem to retrieve a set of subgraphs that cover the maximum number of edges in a database. To efficiently process such query, we present a generic and extensible framework called Ted which achieves a guaranteed approximation ratio to the optimal result. Two optimization strategies are further developed to improve the performance. Experimental studies on real-world datasets demonstrate the superiority of Ted to traditional techniques.