论文标题

用于比特币数据增强的生成对抗网络

Generative Adversarial Networks for Bitcoin Data Augmentation

论文作者

Zola, Francesco, Bruse, Jan Lukas, Barrio, Xabier Etxeberria, Galar, Mikel, Urrutia, Raul Orduna

论文摘要

在比特币实体分类中,结果由地面真实数据集强烈控制,尤其是在应用监督的机器学习方法时。但是,这些基础数据集经常受到重大类不平衡的影响,因为它们通常包含有关法律服务(交易所,赌博)的信息,而不是与非法活动有关的服务(Mixer,Service)。类不平衡增加了应用机器学习技术的复杂性并降低了分类结果的质量,尤其是对于代表性不足但至关重要的类别。 在本文中,我们建议通过使用生成性对抗网络(GAN)来解决这个问题,以进行比特币数据增强,因为GAN最近在图像分类的领域显示了有希望的结果。但是,没有适用于每种情况的“一个合适的” gan解决方案。实际上,设置GAN训练参数是非平凡的,并且会严重影响生成的合成数据的质量。因此,我们评估了GAN参数(例如优化函数,数据集的大小和所选批量大小)如何影响一个代表性不足的实体类(挖掘池)的GAN实现,并演示如何获得“良好” GAN配置,从而在合成生成和真实的比特币接地数据之间获得高度相似性。据我们所知,这是首次将GAN作为生成合成地址数据的有效工具,以进行比特币实体分类中的数据增强。

In Bitcoin entity classification, results are strongly conditioned by the ground-truth dataset, especially when applying supervised machine learning approaches. However, these ground-truth datasets are frequently affected by significant class imbalance as generally they contain much more information regarding legal services (Exchange, Gambling), than regarding services that may be related to illicit activities (Mixer, Service). Class imbalance increases the complexity of applying machine learning techniques and reduces the quality of classification results, especially for underrepresented, but critical classes. In this paper, we propose to address this problem by using Generative Adversarial Networks (GANs) for Bitcoin data augmentation as GANs recently have shown promising results in the domain of image classification. However, there is no "one-fits-all" GAN solution that works for every scenario. In fact, setting GAN training parameters is non-trivial and heavily affects the quality of the generated synthetic data. We therefore evaluate how GAN parameters such as the optimization function, the size of the dataset and the chosen batch size affect GAN implementation for one underrepresented entity class (Mining Pool) and demonstrate how a "good" GAN configuration can be obtained that achieves high similarity between synthetically generated and real Bitcoin address data. To the best of our knowledge, this is the first study presenting GANs as a valid tool for generating synthetic address data for data augmentation in Bitcoin entity classification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源