紧凑型得分：一种无监督特征选择的快速过滤方法

论文标题

紧凑型得分：一种无监督特征选择的快速过滤方法

Compactness Score: A Fast Filter Method for Unsupervised Feature Selection

论文作者

Zhu, Peican, Hou, Xin, Tang, Keke, Wang, Zhen, Nie, Feiping

论文摘要

随着信息时代的繁荣，日常生成大量数据。由于这些数据的大规模和高维度，通常很难在实际应用中实现更好的决策。因此，迫切需要一种有效的大数据分析方法。对于功能工程，功能选择似乎是一个重要的研究内容，预计可以从候选人中选择“出色”功能。可以通过特征选择来实现不同的功能，例如降低维度，模型效应改进和模型性能改进。在许多分类任务中，研究人员发现，如果数据来自同一类，通常它们通常彼此接近。因此，局部紧凑性对于评估功能至关重要。在此手稿中，我们提出了一种快速无监督的特征选择方法，称为紧凑型评分（CSUFS），以选择所需的功能。为了证明效率和准确性，通过进行广泛的实验选择了几个数据集。后来，通过解决聚类任务来揭示我们方法的有效性和优势。在这里，性能由几个众所周知的评估指标表示，而效率则由相应的运行时间反映。正如模拟结果所揭示的那样，与现有算法相比，我们提出的算法似乎更准确，更有效。

Along with the flourish of the information age, massive amounts of data are generated day by day. Due to the large-scale and high-dimensional characteristics of these data, it is often difficult to achieve better decision-making in practical applications. Therefore, an efficient big data analytics method is urgently needed. For feature engineering, feature selection seems to be an important research content in which is anticipated to select "excellent" features from candidate ones. Different functions can be realized through feature selection, such as dimensionality reduction, model effect improvement, and model performance improvement. In many classification tasks, researchers found that data seem to be usually close to each other if they are from the same class; thus, local compactness is of great importance for the evaluation of a feature. In this manuscript, we propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS), to select desired features. To demonstrate the efficiency and accuracy, several data sets are chosen with extensive experiments being performed. Later, the effectiveness and superiority of our method are revealed through addressing clustering tasks. Here, the performance is indicated by several well-known evaluation metrics, while the efficiency is reflected by the corresponding running time. As revealed by the simulation results, our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题