用于自组织地图的绩效指标的调查和实施

论文标题

用于自组织地图的绩效指标的调查和实施

A Survey and Implementation of Performance Metrics for Self-Organized Maps

论文作者

Forest, Florent, Lebbah, Mustapha, Azzag, Hanane, Lacaille, Jérôme

论文摘要

自组织地图算法已在生物学，地质，医疗保健，工业和人文科学等各种应用领域中已有近40年的历史，作为一种可解释的工具，可探索，聚类和可视化高维数据集。在每个应用程序中，从业人员都需要知道他们是否可以\ textit {trust}结果映射，并执行模型选择来调整算法参数（例如，地图大小）。自组织图（SOM）的定量评估是聚类验证的子集，这是一个挑战性的问题。聚类模型选择通常是通过使用聚类有效性指数来实现的。尽管它们也适用于自组织的聚类模型，但它们忽略了地图的拓扑，只回答了以下问题：SOM代码向量是否可以很好地近似数据分布？评估SOM模型会带来评估其拓扑的其他挑战：映射是否保留了地图和原始数据之间的邻里关系？评估SOM模型的性能的问题已经在文献中得到了彻底的解决，孕育了一个质量指数，其中包含邻里约束，称为\ textit {popographic}索引。此类指标的常用示例是地形误差，邻里保存或地形产品。但是，几乎找不到开源实现。这是我们尝试在这项工作中解决的问题：经过对现有的SOM性能指标进行调查后，我们在Python中实施了它们，并广泛使用了数值库，并将其作为开源库Somperf提供。本文介绍了我们的模块中可用的每个度量标准以及用法示例。

Self-Organizing Map algorithms have been used for almost 40 years across various application domains such as biology, geology, healthcare, industry and humanities as an interpretable tool to explore, cluster and visualize high-dimensional data sets. In every application, practitioners need to know whether they can \textit{trust} the resulting mapping, and perform model selection to tune algorithm parameters (e.g. the map size). Quantitative evaluation of self-organizing maps (SOM) is a subset of clustering validation, which is a challenging problem as such. Clustering model selection is typically achieved by using clustering validity indices. While they also apply to self-organized clustering models, they ignore the topology of the map, only answering the question: do the SOM code vectors approximate well the data distribution? Evaluating SOM models brings in the additional challenge of assessing their topology: does the mapping preserve neighborhood relationships between the map and the original data? The problem of assessing the performance of SOM models has already been tackled quite thoroughly in literature, giving birth to a family of quality indices incorporating neighborhood constraints, called \textit{topographic} indices. Commonly used examples of such metrics are the topographic error, neighborhood preservation or the topographic product. However, open-source implementations are almost impossible to find. This is the issue we try to solve in this work: after a survey of existing SOM performance metrics, we implemented them in Python and widely used numerical libraries, and provide them as an open-source library, SOMperf. This paper introduces each metric available in our module along with usage examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题