论文标题
模糊jaccard索引:有序列表的强大比较
Fuzzy Jaccard Index: A robust comparison of ordered lists
论文作者
论文摘要
我们提出了模糊的Jaccard指数(FUJI) - 评估两个排名/有序列表之间相似性的评分不变分数。富士通过合并一个考虑特定等级的会员功能来改善Jaccard指数,从而产生更稳定和更准确的相似性估计。我们为富士分数的特性提供了理论上的见解,并提出了用于计算它的有效算法。我们还提供了其在不同合成场景上表现的经验证据。最后,我们在典型的机器学习设置中演示了其实用性 - 比较与给定的机器学习任务相关的功能排名列表。在现实生活中,尤其是高维域中,在整个功能空间中只有一小部分可能是相关的,这是一个强大而自信的功能排名,导致可解释的发现以及有效的计算和良好的预测性能。在这种情况下,富士可以正确区分现有的特征排名方法,同时比基准的相似性得分更强大,更有效。
We propose Fuzzy Jaccard Index (FUJI) -- a scale-invariant score for assessment of the similarity between two ranked/ordered lists. FUJI improves upon the Jaccard index by incorporating a membership function which takes into account the particular ranks, thus producing both more stable and more accurate similarity estimates. We provide theoretical insights into the properties of the FUJI score as well as propose an efficient algorithm for computing it. We also present empirical evidence of its performance on different synthetic scenarios. Finally, we demonstrate its utility in a typical machine learning setting -- comparing feature ranking lists relevant to a given machine learning task. In real-life, and in particular high-dimensional domains, where only a small percentage of the whole feature space might be relevant, a robust and confident feature ranking leads to interpretable findings as well as efficient computation and good predictive performance. In such cases, FUJI correctly distinguishes between existing feature ranking approaches, while being more robust and efficient than the benchmark similarity scores.