使用汽车提升梯度的功能编码

论文标题

使用汽车提升梯度的功能编码

Feature Encodings for Gradient Boosting with Automunge

论文作者

Teague, Nicholas J.

论文摘要

Automunge是一个表格预处理库，该库编码用于监督学习的数据范围。在为梯度增强学习选择默认功能编码策略时，可以考虑训练持续时间的指标，并实现与功能表示相关的预测性能。 Automunge为分类功能提供了默认的二进制化和数字Z得分归一化。提出的研究试图通过通过调整梯度增强的学习来编码变化来基于一系列不同的数据集来验证这些默认值。我们发现，从调谐持续时间和模型性能的角度来看，我们选择的默认值平均是表现最好的人。另一个关键发现是，与分类二进制相比，一个热编码并非以适用性作为分类默认的方式执行。我们在这里提出这些和进一步的基准。

Automunge is a tabular preprocessing library that encodes dataframes for supervised learning. When selecting a default feature encoding strategy for gradient boosted learning, one may consider metrics of training duration and achieved predictive performance associated with the feature representations. Automunge offers a default of binarization for categoric features and z-score normalization for numeric. The presented study sought to validate those defaults by way of benchmarking on a series of diverse data sets by encoding variations with tuned gradient boosted learning. We found that on average our chosen defaults were top performers both from a tuning duration and a model performance standpoint. Another key finding was that one hot encoding did not perform in a manner consistent with suitability to serve as a categoric default in comparison to categoric binarization. We present here these and further benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题