PMLB V1.0：用于基准测试机器学习方法的开源数据集收集

论文标题

PMLB V1.0：用于基准测试机器学习方法的开源数据集收集

PMLB v1.0: An open source dataset collection for benchmarking machine learning methods

论文作者

Romano, Joseph D., Le, Trang T., La Cava, William, Gregg, John T., Goldberg, Daniel J., Ray, Natasha L., Chakraborty, Praneel, Himmelstein, Daniel, Fu, Weixuan, Moore, Jason H.

论文摘要

动机：新颖的机器学习和统计建模研究依赖于使用精心研究的基准数据集与现有方法的标准化比较。很少有工具通过标准化的，用户友好的界面来快速访问许多此类数据集，该界面与流行的数据科学工作流程很好地集成在一起。结果：PMLB的发布提供了最大的公共基准数据集集合，用于评估一个位置汇总的新机器学习和数据科学方法。 V1.0在与开源社区进行了讨论之后引入了许多关键改进。可用性：PMLB可在https://github.com/epistasislab/pmlb上找到。 PMLB的Python和R接口可以分别通过Python软件包索引和全面的R档案网络安装。

Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community. Availability: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题