分类器转移与在线支持矢量机分类的数据选择策略与类别不平衡

论文标题

分类器转移与在线支持矢量机分类的数据选择策略与类别不平衡

Classifier Transfer with Data Selection Strategies for Online Support Vector Machine Classification with Class Imbalance

论文作者

Krell, Mario Michael, Wilshusen, Nils, Seeland, Anett, Kim, Su Kyoung

论文摘要

目的：分类器传输通常带有数据集偏移。为了克服它们，必须采用在线策略。对于实际应用，必须考虑计算资源适应批处理算法（如SVM）的限制。方法：我们与SVM一起审查并比较了几种在线学习的策略。我们专注于限制存储培训数据规模的数据选择策略[...] 主要结果：对于不同的数据变化，不同的标准是合适的。对于合成数据，将所有样品添加到所考虑的样品库中的性能通常比其他标准明显差。特别是，仅添加错误分类的样本表现出色。在这里，当其他标准没有得到很好的选择时，平衡标准非常重要。对于转移设置，结果表明，最佳策略取决于转移过程中漂移的强度。添加全部并删除最古老的样本会导致最佳性能，而对于较小的漂移，仅添加SVM的潜在新支持向量就足以减少处理资源。意义：用于基于脑电图模型的BCIS，使用了校准会话中的数据，先前的录制会话，甚至是与一个或其他主题的录音会话进行培训。学习模型的这种转移通常会降低性能，因此可以从在线学习中受益，从而适应了像已建立的SVM这样的分类器。我们表明，通过使用数据选择标准的正确组合，可以适应分类器并在很大程度上提高性能。此外，在某些情况下，可以通过使用特殊样本的子集并保留一小部分样品来训练分类器来加快处理并节省计算。

Objective: Classifier transfers usually come with dataset shifts. To overcome them, online strategies have to be applied. For practical applications, limitations in the computational resources for the adaptation of batch learning algorithms, like the SVM, have to be considered. Approach: We review and compare several strategies for online learning with SVMs. We focus on data selection strategies which limit the size of the stored training data [...] Main Results: For different data shifts, different criteria are appropriate. For the synthetic data, adding all samples to the pool of considered samples performs often significantly worse than other criteria. Especially, adding only misclassified samples performed astoundingly well. Here, balancing criteria were very important when the other criteria were not well chosen. For the transfer setups, the results show that the best strategy depends on the intensity of the drift during the transfer. Adding all and removing the oldest samples results in the best performance, whereas for smaller drifts, it can be sufficient to only add potential new support vectors of the SVM which reduces processing resources. Significance: For BCIs based on EEG models, trained on data from a calibration session, a previous recording session, or even from a recording session with one or several other subjects, are used. This transfer of the learned model usually decreases the performance and can therefore benefit from online learning which adapts the classifier like the established SVM. We show that by using the right combination of data selection criteria, it is possible to adapt the classifier and largely increase the performance. Furthermore, in some cases it is possible to speed up the processing and save computational by updating with a subset of special samples and keeping a small subset of samples for training the classifier.

下载PDF全文

下载文献需遵守相关版权规定

论文标题