论文标题
基于密度的几何单级分类器和遗传算法的合奏
An ensemble of Density based Geometric One-Class Classifier and Genetic Algorithm
论文作者
论文摘要
最近机器学习研究中最不断提高的问题之一是一级分类,该分类考虑了仅由一个类和离群值组成的数据集。在处理某些有问题的数据集或特殊情况方面,它比传统的多级分类更合理。通常,在OCC方法中,对用户的分类准确性和解释性被视为权衡。基于超晶格(H-RTGL)的分类器是一种分类器,可以作为这种权衡的补救措施,并使用通过称为Interval的几何规则配方的H-RTGL。此间隔可以是解释性的基础,因为用户可以轻松理解它。但是,现有的基于H-RTGL的OCC分类器具有局限性,即(i)它们中的大多数无法反映目标类别的密度,并且(ii)考虑密度具有原始间隔生成方法,并且(iii)基于H-RTGL的OCC分类器的超级参数存在不可能的系统,从而影响了分类器的分类性能。基于这些备注,我们建议使用更精细的间隔生成方法(包括参数和非参数方法)基于密度(1-HRD_D)的一级超矩形描述符。此外,我们设计了遗传算法(GA),由染色体结构和遗传算子组成,用于通过优化超参数进行系统生成1-HRD_D。我们的工作通过使用实际数据集以及现有OCC算法以及其他基于H-RTGL的分类器进行比较来验证我们的工作。
One of the most rising issues in recent machine learning research is One-Class Classification which considers data set composed of only one class and outliers. It is more reasonable than traditional Multi-Class Classification in dealing with some problematic data set or special cases. Generally, classification accuracy and interpretability for user are considered as trade-off in OCC methods. Classifier based on Hyper-Rectangle (H-RTGL) is a sort of classifier that can be a remedy for such trade-off and uses H-RTGL formulated by conjunction of geometric rules called interval. This interval can be basis of interpretability since it can be easily understood by user. However, existing H-RTGL based OCC classifiers have limitations that (i) most of them cannot reflect density of target class and (ii) that considering density has primitive interval generation method, and (iii) there exists no systematic procedure for hyperparameter of H-RTGL based OCC classifier, which influences classification performance of classifier. Based on these remarks, we suggest One-Class Hyper-Rectangle Descriptor based on density (1-HRD_d) with more elaborate interval generation method including parametric and nonparametric approaches. In addition, we designed Genetic Algorithm (GA) that consists of chromosome structure and genetic operators for systematic generation of 1-HRD_d by optimization of hyperparameter. Our work is validated through a numerical experiment using actual data set with comparison of existing OCC algorithms along with other H-RTGL based classifiers.