论文标题
$ m $分布之间的差异量度
A Kernel Measure of Dissimilarity between $M$ Distributions
论文作者
论文摘要
给定在一般可测量空间上定义的$ M \ GEQ 2 $分布,我们引入了多样本差异(KMD)的非参数(内核)度量 - 一个量化$ M $分布之间差异的参数。在且仅当所有$ m $分布相同的情况下,当且仅当所有分布都是相互奇异时,人口KMD的值是0到1之间的0到1。此外,KMD拥有许多与$ f $ diverences相关的属性,例如数据处理不平等和不变性,而不在生物转换不足的情况下。基于$ M $分布的独立观察,可以使用$ k $ neart的邻居图(对于$ k \ ge 1 $固定)来计算KMD的样本估计。我们基于样本kmd的$ m $分布的平等开发了易于实现的测试,该测试与至少两个分布不相等的所有替代方案是一致的。我们证明了样品KMD的中心限制定理,并提供了测试的渐近力及其检测阈值的完整表征。通过实际和合成数据示例证明了我们措施的有用性;我们的方法也可以在R软件包中实现。
Given $M \geq 2$ distributions defined on a general measurable space, we introduce a nonparametric (kernel) measure of multi-sample dissimilarity (KMD) -- a parameter that quantifies the difference between the $M$ distributions. The population KMD, which takes values between 0 and 1, is 0 if and only if all the $M$ distributions are the same, and 1 if and only if all the distributions are mutually singular. Moreover, KMD possesses many properties commonly associated with $f$-divergences such as the data processing inequality and invariance under bijective transformations. The sample estimate of KMD, based on independent observations from the $M$ distributions, can be computed in near linear time (up to logarithmic factors) using $k$-nearest neighbor graphs (for $k \ge 1$ fixed). We develop an easily implementable test for the equality of $M$ distributions based on the sample KMD that is consistent against all alternatives where at least two distributions are not equal. We prove central limit theorems for the sample KMD, and provide a complete characterization of the asymptotic power of the test, as well as its detection threshold. The usefulness of our measure is demonstrated via real and synthetic data examples; our method is also implemented in an R package.