FastForest：提高随机森林加工速度，同时保持准确性

论文标题

FastForest：提高随机森林加工速度，同时保持准确性

FastForest: Increasing Random Forest Processing Speed While Maintaining Accuracy

论文作者

Yates, Darren, Islam, Md Zahidul

论文摘要

随机森林仍然是数据挖掘最持久的整体算法之一，它达到了有据可查的准确性和处理速度的水平，并且定期出现在新研究中。但是，随着数据挖掘现在到达硬件受限设备（例如智能手机和物联网（IoT）设备）的领域，因此继续需要进一步研究算法效率，以在不牺牲准确性的情况下提供更高的处理速度。我们提出的FastForest算法与随机森林相比，在涉及45个数据集的测试的分类精度上，与随机森林相比，处理速度平均增加了24％。 FastForest通过组合三个优化组件 - 子样本聚合（'subbagging'），对数分配点采样和动态限制的子程序来实现这一结果。此外，对子尺寸的详细测试发现，最佳的标量呈现出正面的处理性能和准确性。

Random Forest remains one of Data Mining's most enduring ensemble algorithms, achieving well-documented levels of accuracy and processing speed, as well as regularly appearing in new research. However, with data mining now reaching the domain of hardware-constrained devices such as smartphones and Internet of Things (IoT) devices, there is continued need for further research into algorithm efficiency to deliver greater processing speed without sacrificing accuracy. Our proposed FastForest algorithm delivers an average 24% increase in processing speed compared with Random Forest whilst maintaining (and frequently exceeding) it on classification accuracy over tests involving 45 datasets. FastForest achieves this result through a combination of three optimising components - Subsample Aggregating ('Subbagging'), Logarithmic Split-Point Sampling and Dynamic Restricted Subspacing. Moreover, detailed testing of Subbagging sizes has found an optimal scalar delivering a positive mix of processing performance and accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题