具有区分嵌入网络的可解释可视化

论文标题

具有区分嵌入网络的可解释可视化

Interpretable Visualizations with Differentiating Embedding Networks

论文作者

Robinson, Isaac

论文摘要

我们提出了一种基于一种新型无监督的暹罗神经网络训练和损耗功能的可视化算法，称为区分嵌入网络（DEN）。暹罗神经网络在数据集中发现了特定的样本对之间的区分或类似特征，并使用这些功能将数据集嵌入可以可视化的较低维空间中。与现有的可视化算法（例如UMAP或$ T $ -SNE）不同，DEN是参数性的，这意味着可以通过Shap等技术来解释它。为了解释DEN，我们在可视化之上创建了一种端到端参数聚类算法，然后利用形状分数来确定样品空间中的哪些特征对于理解基于发现的群集中可视化的结构很重要。我们将DEN可视化与包括图像和SCRNA-Seq数据在内的各种数据集上的现有技术进行了比较。然后，我们证明我们的聚类算法的性能与最先进的状态相似，尽管没有事先了解簇的数量，并且在FashionMnist上设定了新的最新状态。最后，我们演示了找到数据集的区别功能。可在https://github.com/isaacrob/den上找到代码

We present a visualization algorithm based on a novel unsupervised Siamese neural network training regime and loss function, called Differentiating Embedding Networks (DEN). The Siamese neural network finds differentiating or similar features between specific pairs of samples in a dataset, and uses these features to embed the dataset in a lower dimensional space where it can be visualized. Unlike existing visualization algorithms such as UMAP or $t$-SNE, DEN is parametric, meaning it can be interpreted by techniques such as SHAP. To interpret DEN, we create an end-to-end parametric clustering algorithm on top of the visualization, and then leverage SHAP scores to determine which features in the sample space are important for understanding the structures shown in the visualization based on the clusters found. We compare DEN visualizations with existing techniques on a variety of datasets, including image and scRNA-seq data. We then show that our clustering algorithm performs similarly to the state of the art despite not having prior knowledge of the number of clusters, and sets a new state of the art on FashionMNIST. Finally, we demonstrate finding differentiating features of a dataset. Code available at https://github.com/isaacrob/DEN

下载PDF全文

下载文献需遵守相关版权规定

论文标题