从威胁报告到连续威胁智能：从文本伪像的攻击技术提取方法的比较

论文标题

从威胁报告到连续威胁智能：从文本伪像的攻击技术提取方法的比较

From Threat Reports to Continuous Threat Intelligence: A Comparison of Attack Technique Extraction Methods from Textual Artifacts

论文作者

Rahman, Md Rayhanur, Williams, Laurie

论文摘要

网络威胁景观正在不断发展。因此，持续的监控和共享威胁情报已成为组织的优先事项。由网络安全供应商发布的威胁报告包含以非结构化文本格式编写的攻击策略，技术和程序（TTP）的详细描述。从这些报告中提取TTP有助于网络安全从业人员和研究人员学习并适应不断发展的攻击以及计划威胁减轻。研究人员提出了文献中的TTP提取方法，但是，并非所有这些提出的方法都相互比较或基线。 \ textIt {本研究的目的是帮助网络安全研究人员和从业人员选择攻击技术提取方法，通过比较文献中TTP提取研究的基本方法来监测和共享威胁智能。}在这项工作中，我们在这项工作中，我们确定了从文献和实施十项研究中实施的五种现有的TTP提取研究。我们发现两种方法，基于术语频率式文档频率（TFIDF）和潜在语义索引（LSI），分别胜过其他三种方法，其F1得分分别为84 \％\％和83 \％。在呈指数增长类标签的情况下，我们观察到F1得分下降中所有方法的性能。我们还实施和评估了一种过度采样策略，以减轻阶级不平衡问题。此外，过采样可改善TTP提取的分类性能。我们为未来的网络安全研究人员提供了建议，例如从大型语料库中构建基准数据集；以及TTP的文本功能的选择。我们的工作以及数据集和实施源代码可以作为网络安全研究人员测试和比较未来TTP提取方法的性能的基准。

The cyberthreat landscape is continuously evolving. Hence, continuous monitoring and sharing of threat intelligence have become a priority for organizations. Threat reports, published by cybersecurity vendors, contain detailed descriptions of attack Tactics, Techniques, and Procedures (TTP) written in an unstructured text format. Extracting TTP from these reports aids cybersecurity practitioners and researchers learn and adapt to evolving attacks and in planning threat mitigation. Researchers have proposed TTP extraction methods in the literature, however, not all of these proposed methods are compared to one another or to a baseline. \textit{The goal of this study is to aid cybersecurity researchers and practitioners choose attack technique extraction methods for monitoring and sharing threat intelligence by comparing the underlying methods from the TTP extraction studies in the literature.} In this work, we identify ten existing TTP extraction studies from the literature and implement five methods from the ten studies. We find two methods, based on Term Frequency-Inverse Document Frequency(TFIDF) and Latent Semantic Indexing (LSI), outperform the other three methods with a F1 score of 84\% and 83\%, respectively. We observe the performance of all methods in F1 score drops in the case of increasing the class labels exponentially. We also implement and evaluate an oversampling strategy to mitigate class imbalance issues. Furthermore, oversampling improves the classification performance of TTP extraction. We provide recommendations from our findings for future cybersecurity researchers, such as the construction of a benchmark dataset from a large corpus; and the selection of textual features of TTP. Our work, along with the dataset and implementation source code, can work as a baseline for cybersecurity researchers to test and compare the performance of future TTP extraction methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题