论文标题
GoodFatr的兴起:一种新颖的准确性比较指标提取工具的方法
The Rise of GoodFATR: A Novel Accuracy Comparison Methodology for Indicator Extraction Tools
论文作者
论文摘要
为了适应不断发展的网络威胁景观,组织积极需要收集妥协的指标(IOC),即法医文物,表明房东或网络可能已被妥协。可以通过开源和商业结构化的IOC饲料来收集IOC。但是,也可以从以自然语言编写的无数非结构化威胁报告中提取,并使用多种来源(例如博客和社交媒体)分发。存在多种指标提取工具,可以在自然语言报告中识别IOC。但是,由于难以构建大型地面真相数据集的困难,很难比较它们的准确性。这项工作为比较指标提取工具的准确性提供了一种新颖的多数投票方法,这不需要手动建造的地面真相。我们将方法论实施到GoodFatr,这是一个自动化平台,用于从大量来源收集威胁报告,使用多种工具从收集的报告中提取IOC,并比较它们的准确性。 GoodFatr支持6个威胁报告资源:RSS,Twitter,Telegram,Malpedia,Aptnotes和Chainsmith。 GoodFatr不断监视资源,下载新的威胁报告,从收集的报告中提取41个指标类型,并过滤非质量指示器以输出IOC。我们在15个月内运行GoodFatr,从6个来源收集472,891个报告;从报告中提取978,151个指标;并确定618,217 IOC。我们分析了收集的数据,以识别IOC顶级贡献者和IOC类分布。我们应用GoodFatr来比较GoodFatr自己的指示器提取模块的7种流行开源工具的IOC提取精度。
To adapt to a constantly evolving landscape of cyber threats, organizations actively need to collect Indicators of Compromise (IOCs), i.e., forensic artifacts that signal that a host or network might have been compromised. IOCs can be collected through open-source and commercial structured IOC feeds. But, they can also be extracted from a myriad of unstructured threat reports written in natural language and distributed using a wide array of sources such as blogs and social media. There exist multiple indicator extraction tools that can identify IOCs in natural language reports. But, it is hard to compare their accuracy due to the difficulty of building large ground truth datasets. This work presents a novel majority vote methodology for comparing the accuracy of indicator extraction tools, which does not require a manually-built ground truth. We implement our methodology into GoodFATR, an automated platform for collecting threat reports from a wealth of sources, extracting IOCs from the collected reports using multiple tools, and comparing their accuracy. GoodFATR supports 6 threat report sources: RSS, Twitter, Telegram, Malpedia, APTnotes, and ChainSmith. GoodFATR continuously monitors the sources, downloads new threat reports, extracts 41 indicator types from the collected reports, and filters non-malicious indicators to output the IOCs. We run GoodFATR over 15 months to collect 472,891 reports from the 6 sources; extract 978,151 indicators from the reports; and identify 618,217 IOCs. We analyze the collected data to identify the top IOC contributors and the IOC class distribution. We apply GoodFATR to compare the IOC extraction accuracy of 7 popular open-source tools with GoodFATR's own indicator extraction module.