论文标题

动态的合奏尺寸调整,以限制记忆的蒙德里安森林

Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest

论文作者

Khannouz, Martin, Glatard, Tristan

论文摘要

监督的学习算法通常假设在培训和测试阶段期间可以存储足够的内存来存储数据模型。但是,当数据以无限数据流的形式出现,或者将学习算法部署在具有减少内存量的设备上时,此假设是不现实的。这种记忆约束会影响模型行为和假设。在本文中,我们表明,在内存约束下,增加基于树的整体分类器的大小可能会使其性能恶化。特别是,我们通过实验表明,在数据流上构造了由记忆的蒙德里安森林中存在最佳的集合尺寸,并通过使用过度拟合的估计来设计一种算法,以指导森林来实现该森林的最佳数量。我们在各种真实和模拟的数据集上测试了该算法的不同变化,我们得出的结论是,我们的方法可以实现最高尺寸尺寸的Mondrian Forest的95%的稳定数据集的性能,并且在具有概念漂移的数据集中可以超过其表现。我们所有的方法均在ORPAILLECC开源库中实现,并准备在嵌入式系统和连接的对象上使用。

Supervised learning algorithms generally assume the availability of enough memory to store data models during the training and test phases. However, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. Such memory constraints impact the model behavior and assumptions. In this paper, we show that under memory constraints, increasing the size of a tree-based ensemble classifier can worsen its performance. In particular, we experimentally show the existence of an optimal ensemble size for a memory-bounded Mondrian forest on data streams and we design an algorithm to guide the forest toward that optimal number by using an estimation of overfitting. We tested different variations for this algorithm on a variety of real and simulated datasets, and we conclude that our method can achieve up to 95% of the performance of an optimally-sized Mondrian forest for stable datasets, and can even outperform it for datasets with concept drifts. All our methods are implemented in the OrpailleCC open-source library and are ready to be used on embedded systems and connected objects.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源