论文标题
聚类的算法解释
Algorithm-Agnostic Interpretations for Clustering
论文作者
论文摘要
通常通过后处理,涉及降低和后续可视化来解释高维数据的聚类结果。这破坏了数据的含义并混淆了解释。我们提出了算法 - 敏锐的解释方法,以在缩小维度中解释聚类结果,同时保留数据的完整性。集群的置换特征的重要性代表基于改组特征值并通过自定义分数功能衡量群集分配的变化的一般框架。集群的个体条件期望表明,由于数据的变化,聚类分配的观察变化。聚类的部分依赖性评估整个特征空间的群集分配的平均变化。所有方法都可以与能够通过软标签重新分配实例的任何聚类算法一起使用。与常见的后处理方法(例如主成分分析)相反,引入的方法保持了特征的原始结构。
A clustering outcome for high-dimensional data is typically interpreted via post-processing, involving dimension reduction and subsequent visualization. This destroys the meaning of the data and obfuscates interpretations. We propose algorithm-agnostic interpretation methods to explain clustering outcomes in reduced dimensions while preserving the integrity of the data. The permutation feature importance for clustering represents a general framework based on shuffling feature values and measuring changes in cluster assignments through custom score functions. The individual conditional expectation for clustering indicates observation-wise changes in the cluster assignment due to changes in the data. The partial dependence for clustering evaluates average changes in cluster assignments for the entire feature space. All methods can be used with any clustering algorithm able to reassign instances through soft or hard labels. In contrast to common post-processing methods such as principal component analysis, the introduced methods maintain the original structure of the features.