使用电子价值选择功能

论文标题

使用电子价值选择功能

Feature Selection using e-values

论文作者

Majumdar, Subhabrata, Chatterjee, Snigdhansu

论文摘要

在监督参数模型的背景下，我们介绍了电子价值的概念。电子价值是标量数量，代表了以在所有功能（即完整模型）上训练的模型子集的模型中训练的模型中参数估计值的接近度。在一般条件下，电子价值的等级排序将包含所有基本特征的模型与不具有的模型分开。电子价值适用于广泛的参数模型。我们使用数据深度和基于快速重采样的算法来使用电子价值来实现特征选择过程，从而提供一致性结果。对于$ p $维的功能空间，此过程仅拟合完整的模型并评估$ P+1 $型号，而不是拟合和评估$ 2^p $型号的传统要求。通过在几个模型设置以及合成和真实数据集的实验中，我们确定电子价值方法是现有特定于特定模型特征选择方法的有希望的一般替代方法。

In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. The e-values are applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure using e-values, providing consistency results. For a $p$-dimensional feature space, this procedure requires fitting only the full model and evaluating $p+1$ models, as opposed to the traditional requirement of fitting and evaluating $2^p$ models. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values method as a promising general alternative to existing model-specific methods of feature selection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题