论文标题
恶意广告URL检测框架的对抗攻击分析
An Adversarial Attack Analysis on Malicious Advertisement URL Detection Framework
论文作者
论文摘要
恶意广告URL构成了安全风险,因为它们是网络攻击的来源,并且在行业和学术界都在不断解决此问题。通常,攻击者通过电子邮件,广告链接或任何其他通信方式向用户提供攻击向量,并将其引导到恶意网站以窃取敏感信息并欺骗他们。现有的恶意URL检测技术受到限制,并且可以处理看不见的功能并推广到测试数据。在这项研究中,我们提取了一套新颖的词汇和网络覆盖的功能,并采用机器学习技术来建立用于欺诈性广告URL检测的系统。六种不同特征的组合精确地克服了欺诈性URL分类中的混淆。基于不同的统计属性,我们使用十二个不同格式的数据集进行检测,预测和分类任务。我们扩展了对不匹配和未标记数据集的预测分析。对于此框架,我们分析了四种机器学习技术的性能:随机森林,梯度提升,XGBoost和Adaboost在检测部件中。使用我们提出的方法,我们可以达到低至0.0037的假负率,同时保持高精度为99.63%。此外,我们设计了一种新型的无监督技术,用于使用K-均值算法进行视觉分析的数据聚类。本文使用有限的知识攻击方案分析了基于决策树模型的脆弱性。我们考虑了探索性攻击,并对检测模型实施了零订单优化的对抗性攻击。
Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks, and the need to address this issue is growing in both industry and academia. Generally, the attacker delivers an attack vector to the user by means of an email, an advertisement link or any other means of communication and directs them to a malicious website to steal sensitive information and to defraud them. Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data. In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection. The combination set of six different kinds of features precisely overcome the obfuscation in fraudulent URL classification. Based on different statistical properties, we use twelve different formatted datasets for detection, prediction and classification task. We extend our prediction analysis for mismatched and unlabelled datasets. For this framework, we analyze the performance of four machine learning techniques: Random Forest, Gradient Boost, XGBoost and AdaBoost in the detection part. With our proposed method, we can achieve a false negative rate as low as 0.0037 while maintaining high accuracy of 99.63%. Moreover, we devise a novel unsupervised technique for data clustering using K- Means algorithm for the visual analysis. This paper analyses the vulnerability of decision tree-based models using the limited knowledge attack scenario. We considered the exploratory attack and implemented Zeroth Order Optimization adversarial attack on the detection models.