拦截冰雹九头蛇：算法生成域的实时检测

论文标题

拦截冰雹九头蛇：算法生成域的实时检测

Intercepting Hail Hydra: Real-Time Detection of Algorithmically Generated Domains

论文作者

Casino, Fran, Lykousas, Nikolaos, Homoliak, Ivan, Patsakis, Constantinos, Hernandez-Castro, Julio

论文摘要

对于网络犯罪分子而言，至关重要的技术挑战是对可能建立其僵尸网络的数百万个受感染的设备保持控制，而不会损害其攻击的鲁棒性。例如，可以通过二进制或流量分析对单个固定的C＆C服务器进行琐碎的检测，并立即被安全研究人员或执法部门陷入困境。僵尸网络通常使用域产生算法（DGA），主要是为了逃避撤离尝试。 DGA可以扩大恶意软件活动的寿命，从而有可能提高其盈利能力。他们还可以有助于阻碍攻击责任。在这项工作中，我们介绍了Hydras，这是迄今为止可用的算法生成域（AGD）的最全面和代表性的数据集。该数据集包含100多个DGA系列，包括现实世界和对抗性的家族。我们分析数据集并讨论将良性请求（与真实域）和恶意范围（对AGD）实时区分的可能性。对许多家庭和变体的同时研究引入了一些挑战。尽管如此，它减轻了以前文献中使用的小型数据集发现的偏见，这些数据集经常被过度拟合，从而利用特定家庭的特征特征，这些特征不能很好地推广。我们将我们的方法与当前的最新方法进行了比较，并在实际的实践状态下突出了一些方法论上的缺点。获得的结果表明，我们提出的方法在分类性能和效率方面都大大优于当前的最新方法。

A crucial technical challenge for cybercriminals is to keep control over the potentially millions of infected devices that build up their botnets, without compromising the robustness of their attacks. A single, fixed C&C server, for example, can be trivially detected either by binary or traffic analysis and immediately sink-holed or taken-down by security researchers or law enforcement. Botnets often use Domain Generation Algorithms (DGAs), primarily to evade take-down attempts. DGAs can enlarge the lifespan of a malware campaign, thus potentially enhancing its profitability. They can also contribute to hindering attack accountability. In this work, we introduce HYDRAS, the most comprehensive and representative dataset of Algorithmically-Generated Domains (AGD) available to date. The dataset contains more than 100 DGA families, including both real-world and adversarially designed ones. We analyse the dataset and discuss the possibility of differentiating between benign requests (to real domains) and malicious ones (to AGDs) in real-time. The simultaneous study of so many families and variants introduces several challenges; nonetheless, it alleviates biases found in previous literature employing small datasets which are frequently overfitted, exploiting characteristic features of particular families that do not generalise well.We thoroughly compare our approach with the current state-of-the-art and highlight some methodological shortcomings in the actual state of practice. The outcomes obtained show that our proposed approach significantly outperforms the current state-of-the-art in terms of both classification performance and efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题