论文标题

通过最小描述长度发现代表性属性明星

Discovering Representative Attribute-stars via Minimum Description Length

论文作者

Liu, Jiahong, Zhou, Min, Fournier-Viger, Philippe, Yang, Menglin, Pan, Lujia, Nouioua, Mourad

论文摘要

图是许多域中发现的流行数据类型。已经提出了许多技术来在图中找到有趣的模式,以帮助了解数据并支持决策。但是,通常有两个局限性阻碍了它们的实际用途:(1)它们具有多个参数,难以设定但极大地影响结果,(2)它们通常专注于识别复杂的子图,而忽略了节点属性之间的关系。Graphs是在许多域中发现的流行数据类型。已经提出了许多技术来在图中找到有趣的模式,以帮助了解数据并支持决策。但是,通常有两个局限性阻碍了它们的实际用途:(1)它们具有难以设定但极大地影响结果的多个参数,(2)他们通常专注于识别复杂的子图,而忽略节点属性之间的关系。为了解决这些问题,我们提出了一种名为CSPM的无参数算法(压缩星模式矿工),该算法通过条件熵的概念和最小描述长度原理来标识属性之间强烈相关性的星形模式。在几个基准数据集上执行的实验表明,CSPM揭示了有见地和可解释的模式,并且在运行时有效。此外,对两个现实世界应用程序的定量评估表明,CSPM具有广泛的应用程序,因为它成功地将图形属性完成模型的准确性提高了30.68%,并发现电信警报数据中的重要模式。

Graphs are a popular data type found in many domains. Numerous techniques have been proposed to find interesting patterns in graphs to help understand the data and support decision-making. However, there are generally two limitations that hinder their practical use: (1) they have multiple parameters that are hard to set but greatly influence results, (2) and they generally focus on identifying complex subgraphs while ignoring relationships between attributes of nodes.Graphs are a popular data type found in many domains. Numerous techniques have been proposed to find interesting patterns in graphs to help understand the data and support decision-making. However, there are generally two limitations that hinder their practical use: (1) they have multiple parameters that are hard to set but greatly influence results, (2) and they generally focus on identifying complex subgraphs while ignoring relationships between attributes of nodes. To address these problems, we propose a parameter-free algorithm named CSPM (Compressing Star Pattern Miner) which identifies star-shaped patterns that indicate strong correlations among attributes via the concept of conditional entropy and the minimum description length principle. Experiments performed on several benchmark datasets show that CSPM reveals insightful and interpretable patterns and is efficient in runtime. Moreover, quantitative evaluations on two real-world applications show that CSPM has broad applications as it successfully boosts the accuracy of graph attribute completion models by up to 30.68\% and uncovers important patterns in telecommunication alarm data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源