使用非Convex融合惩罚对高维度进行建模

论文标题

使用非Convex融合惩罚对高维度进行建模

Modelling High-Dimensional Categorical Data Using Nonconvex Fusion Penalties

论文作者

Stokell, Benjamin G., Shah, Rajen D., Tibshirani, Ryan J.

论文摘要

我们提出了一种具有标称分类数据的高维线性模型中估计的方法。我们的估计器（称为范围）通过使其相应系数完全相等，将水平融合在一起。这是使用Minimax凹面惩罚来实现的，这是分类变量系数的顺序统计量之间的差异，从而聚集了系数。我们提供了一种算法，以在情况下具有具有许多级别的单个变量，以精确有效地计算所得的非凸目标的全局最小值，并在多变量情况下的块坐标下降过程中使用它。我们表明，利用未知水平融合的Oracle最小二乘解决方案是坐标下降的极限点，只要真实水平具有一定的最小分离；在单变量情况下，这些条件是最小的。我们演示了范围在一系列真实和模拟数据集中的范围的良好性能。 CRAN上提供了用于线性模型和逻辑回归版本的范围的R软件包catreg。

We propose a method for estimation in high-dimensional linear models with nominal categorical data. Our estimator, called SCOPE, fuses levels together by making their corresponding coefficients exactly equal. This is achieved using the minimax concave penalty on differences between the order statistics of the coefficients for a categorical variable, thereby clustering the coefficients. We provide an algorithm for exact and efficient computation of the global minimum of the resulting nonconvex objective in the case with a single variable with potentially many levels, and use this within a block coordinate descent procedure in the multivariate case. We show that an oracle least squares solution that exploits the unknown level fusions is a limit point of the coordinate descent with high probability, provided the true levels have a certain minimum separation; these conditions are known to be minimal in the univariate case. We demonstrate the favourable performance of SCOPE across a range of real and simulated datasets. An R package CatReg implementing SCOPE for linear models and also a version for logistic regression is available on CRAN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题