论文标题
深度功能筛选:通过深神经网络进行超高维数据的特征选择
Deep Feature Screening: Feature Selection for Ultra High-Dimensional Data via Deep Neural Networks
论文作者
论文摘要
传统的统计特征选择方法在高维度,低样本大小的数据中的应用通常会挣扎并遇到挑战性问题,例如过度拟合,维度的诅咒,计算不可行和强大的模型假设。在本文中,我们提出了一种新型的两步非参数方法,称为Deep Feature Spearsing(DEEPFS),该方法可以克服这些问题,并确定具有高度精度的重要特征,以获得超高的高样本,低样本大小的数据。该方法首先提取输入数据的低维表示,然后根据Deb和Sen(2021)最近开发的多元秩距离相关性应用功能筛选。这种方法结合了深度神经网络和功能筛选的优势,因此除了处理超高维数据的能力外,还具有以下吸引人的功能,并具有少量的样本:(1)它是无模型且无模型的; (2)它可用于监督和无监督的特征选择; (3)它能够恢复原始输入数据。通过广泛的模拟研究和实际数据分析证明了DEEPF的优势。
The applications of traditional statistical feature selection methods to high-dimension, low sample-size data often struggle and encounter challenging problems, such as overfitting, curse of dimensionality, computational infeasibility, and strong model assumption. In this paper, we propose a novel two-step nonparametric approach called Deep Feature Screening (DeepFS) that can overcome these problems and identify significant features with high precision for ultra high-dimensional, low-sample-size data. This approach first extracts a low-dimensional representation of input data and then applies feature screening based on multivariate rank distance correlation recently developed by Deb and Sen (2021). This approach combines the strengths of both deep neural networks and feature screening, and thereby has the following appealing features in addition to its ability of handling ultra high-dimensional data with small number of samples: (1) it is model free and distribution free; (2) it can be used for both supervised and unsupervised feature selection; and (3) it is capable of recovering the original input data. The superiority of DeepFS is demonstrated via extensive simulation studies and real data analyses.