论文标题

使用基于新颖的对抗网络的方法不平衡的类数据绩效评估和改进方法:SSG和GBO

Imbalanced Class Data Performance Evaluation and Improvement using Novel Generative Adversarial Network-based Approach: SSG and GBO

论文作者

Ahsan, Md Manjurul, Ali, Md Shahin, Siddique, Zahed

论文摘要

数据集中的类不平衡是主要挑战之一,可以显着影响机器学习模型的性能,从而产生偏见的预测。已经提出了许多技术来解决阶级不平衡问题,包括但不限于过度采样,采样和成本敏感的方法。由于其生成合成数据的能力,研究人员是最常用的方法之一,例如合成少数族裔过采样技术(SMOTE)等过采样技术。但是,Smote的潜在缺点之一是新创建的次要样本可能与主要样本重叠。为了效果,ML模型对主要类别的偏向性能的概率增加。最近,生成的对抗网络(GAN)由于能够创建几乎真实的样本而引起了很多关注。但是,即使Gan具有很大的潜力,也很难训练。这项研究提出了两种新型技术:基于GAN的过采样(GBO)和支持向量机器 - 烟项(SSG),以克服现有的过采样方法的局限性。初步计算结果表明,SSG和GBO在扩展的不平衡八个基准数据集上的性能要比原始SMOTE更好。该研究还表明,SSG生成的次要样本表明高斯分布通常很难使用原始Smote实现。

Class imbalance in a dataset is one of the major challenges that can significantly impact the performance of machine learning models resulting in biased predictions. Numerous techniques have been proposed to address class imbalanced problems, including, but not limited to, Oversampling, Undersampling, and cost-sensitive approaches. Due to its ability to generate synthetic data, oversampling techniques such as the Synthetic Minority Oversampling Technique (SMOTE) is among the most widely used methodology by researchers. However, one of SMOTE's potential disadvantages is that newly created minor samples may overlap with major samples. As an effect, the probability of ML models' biased performance towards major classes increases. Recently, generative adversarial network (GAN) has garnered much attention due to its ability to create almost real samples. However, GAN is hard to train even though it has much potential. This study proposes two novel techniques: GAN-based Oversampling (GBO) and Support Vector Machine-SMOTE-GAN (SSG) to overcome the limitations of the existing oversampling approaches. The preliminary computational result shows that SSG and GBO performed better on the expanded imbalanced eight benchmark datasets than the original SMOTE. The study also revealed that the minor sample generated by SSG demonstrates Gaussian distributions, which is often difficult to achieve using original SMOTE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源