论文标题
加密恶意交通检测的机器学习:方法,数据集和比较研究
Machine Learning for Encrypted Malicious Traffic Detection: Approaches, Datasets and Comparative Study
论文作者
论文摘要
随着人们对个人隐私和数据安全的需求成为当务之急,加密流量已成为网络世界中的主流。但是,交通加密也屏蔽了对手引入的恶意和非法流量,无法被发现。在199年后恶意加密迅速增长的环境中尤其如此。依靠普通有效载荷内容分析(例如深度数据包检查)的通用安全解决方案毫无用处。因此,基于机器学习的方法已成为加密恶意交通检测的重要方向。在本文中,我们制定了基于机器学习的加密交通检测技术的通用框架,并提供了系统的审查。此外,由于缺乏公认的数据集和功能集,当前的研究采用了不同的数据集来训练其模型。结果,无法可靠地比较和分析其模型性能。因此,在本文中,我们分析,处理并结合了来自5个不同来源的数据集,以生成一个全面且公平的数据集,以帮助该领域的未来研究。在此基础上,我们还实施并比较了10种加密的恶意交通检测算法。然后,我们讨论挑战并提出未来的研究方向。
As people's demand for personal privacy and data security becomes a priority, encrypted traffic has become mainstream in the cyber world. However, traffic encryption is also shielding malicious and illegal traffic introduced by adversaries, from being detected. This is especially so in the post-COVID-19 environment where malicious traffic encryption is growing rapidly. Common security solutions that rely on plain payload content analysis such as deep packet inspection are rendered useless. Thus, machine learning based approaches have become an important direction for encrypted malicious traffic detection. In this paper, we formulate a universal framework of machine learning based encrypted malicious traffic detection techniques and provided a systematic review. Furthermore, current research adopts different datasets to train their models due to the lack of well-recognized datasets and feature sets. As a result, their model performance cannot be compared and analyzed reliably. Therefore, in this paper, we analyse, process and combine datasets from 5 different sources to generate a comprehensive and fair dataset to aid future research in this field. On this basis, we also implement and compare 10 encrypted malicious traffic detection algorithms. We then discuss challenges and propose future directions of research.