论文标题
模块化密度最大化的研究:柱产生加速度和计算复杂性分析
A Study on Modularity Density Maximization: Column Generation Acceleration and Computational Complexity Analysis
论文作者
论文摘要
社区检测是一种基本的网络分析原始分析,在不同领域中有各种应用。尽管Newman和Girvan(2004)引入的模块化已被广泛用作社区检测的质量功能,但它具有一些缺点。 Li等人引入的模块化密度。 (2008年)是模块化的有效替代方法,它减轻了一个称为分辨率限制的缺点之一。在没有任何计算复杂性分析的情况下,大量的工作致力于设计精确的启发式方法,以最大化模块化密度。在这项研究中,我们研究了算法和计算复杂性方面的模块化密度最大化。具体而言,我们首先加速了柱生成,以解决模块化密度最大化问题。为此,我们指出,在列生成中出现的辅助问题可以看作是一个密集的子图发现问题。然后,我们采用了一个众所周知的策略来进行密集的子图发现,称为贪婪的剥皮,以大致解决辅助问题。此外,我们将辅助问题重新制定为$ 0 $ - $ 1 $线性编程问题的顺序,使我们能够更有效地计算其最佳价值并获得更多样化的列。使用各种现实世界网络的计算实验证明了我们提出的算法的有效性。最后,我们显示了模块化密度最大化问题的轻微变体的NP硬度,其中输出分区必须具有两个或多个簇,并显示辅助问题的NP固定度。
Community detection is a fundamental network-analysis primitive with a variety of applications in diverse domains. Although the modularity introduced by Newman and Girvan (2004) has widely been used as a quality function for community detection, it has some drawbacks. The modularity density introduced by Li et al. (2008) is known to be an effective alternative to the modularity, which mitigates one of the drawbacks called the resolution limit. A large body of work has been devoted to designing exact and heuristic methods for modularity density maximization, without any computational complexity analysis. In this study, we investigate modularity density maximization from both algorithmic and computational complexity aspects. Specifically, we first accelerate column generation for the modularity density maximization problem. To this end, we point out that the auxiliary problem appearing in column generation can be viewed as a dense subgraph discovery problem. Then we employ a well-known strategy for dense subgraph discovery, called the greedy peeling, for approximately solving the auxiliary problem. Moreover, we reformulate the auxiliary problem to a sequence of $0$--$1$ linear programming problems, enabling us to compute its optimal value more efficiently and to get more diverse columns. Computational experiments using a variety of real-world networks demonstrate the effectiveness of our proposed algorithm. Finally, we show the NP-hardness of a slight variant of the modularity density maximization problem, where the output partition has to have two or more clusters, as well as showing the NP-hardness of the auxiliary problem.