论文标题
对社区数量的最佳估计
Optimal Estimation of the Number of Communities
论文作者
论文摘要
在网络分析中,如何估计$ k $的社区数量是一个基本问题。我们考虑一个广泛的环境,我们允许严重的学位异质性和广泛的稀疏度,并提出逐步拟合优点(STGOF)作为一种新方法。这是一种逐步的算法,其中$ m = 1,2,\ ldots $,我们或者另一种使用社区检测步骤和合适性(GOF)步骤。我们适应社区检测的分数\ cite {Score},并提出了一个新的GOF度量。我们表明,在步骤$ m $的情况下,GOF度量的所有$ m <k $的概率分别为$ \ infty $,如果$ m = k $,则收集到$ n(0,1)$。这产生了$ K $的一致估计。另外,我们发现定义我们问题的信噪比(SNR)的正确方法,并表明,如果$ \ mathrm {snr} \ goto 0 $,$ k $的一致估计不存在,而sTGOF均匀地一致,则对于$ k $,如果$ \ \ m athrm {snr {snr} \ goto {snr} \ goto {snr} \ goto \ infty。因此,STGOF实现了最佳相变。 已知类似的逐步方法(例如,\ cite {wang2017likelihood,ma2018否定})面临分析挑战。我们通过在STGOF中使用不同的逐步方案来克服挑战,并得出以前无法使用的尖锐结果。我们分析的关键是表明得分具有{\ it non-Splitting属性(NSP)}。主要是由于Davis-Kahan $ \ sin(θ)$定理所决定的特征向量的不可扣除,因此NSP是不宽松的,无法证明并需要我们开发的新技术。
In network analysis, how to estimate the number of communities $K$ is a fundamental problem. We consider a broad setting where we allow severe degree heterogeneity and a wide range of sparsity levels, and propose Stepwise Goodness-of-Fit (StGoF) as a new approach. This is a stepwise algorithm, where for $m = 1, 2, \ldots$, we alternately use a community detection step and a goodness-of-fit (GoF) step. We adapt SCORE \cite{SCORE} for community detection, and propose a new GoF metric. We show that at step $m$, the GoF metric diverges to $\infty$ in probability for all $m < K$ and converges to $N(0,1)$ if $m = K$. This gives rise to a consistent estimate for $K$. Also, we discover the right way to define the signal-to-noise ratio (SNR) for our problem and show that consistent estimates for $K$ do not exist if $\mathrm{SNR} \goto 0$, and StGoF is uniformly consistent for $K$ if $\mathrm{SNR} \goto \infty$. Therefore, StGoF achieves the optimal phase transition. Similar stepwise methods (e.g., \cite{wang2017likelihood, ma2018determining}) are known to face analytical challenges. We overcome the challenges by using a different stepwise scheme in StGoF and by deriving sharp results that are not available before. The key to our analysis is to show that SCORE has the {\it Non-Splitting Property (NSP)}. Primarily due to a non-tractable rotation of eigenvectors dictated by the Davis-Kahan $\sin(θ)$ theorem, the NSP is non-trivial to prove and requires new techniques we develop.