论文标题

LINEVD:使用图神经网络的语句级漏洞检测

LineVD: Statement-level Vulnerability Detection using Graph Neural Networks

论文作者

Hin, David, Kan, Andrey, Chen, Huaming, Babar, M. Ali

论文摘要

当前基于机器的软件漏洞检测方法主要是在功能级别进行的。但是,这些方法的关键限制是它们没有指示导致漏洞的特定代码行。这限制了开发人员有效检查和解释从学习模型中的预测的能力,这对于将基于机器学习的工具集成到软件开发工作流程中至关重要。基于图的模型显示了功能级漏洞检测的有希望的性能,但是尚未广泛探索其语句级别漏洞检测的能力。通过可解释的AI解释功能级预测是一个有前途的方向,我们在这里考虑从完全监督的学习角度考虑语句级软件漏洞检测任务。我们提出了一个新颖的深度学习框架LineVD,该框架将语句级别的漏洞检测作为节点分类任务。 LineVD使用图形神经网络和基于变压器的模型在语句之间利用控制和数据依赖性来编码原始源代码令牌。特别是,通过解决函数级别和语句级信息之间的相互矛盾的输出,LineVD显着改善了预测性能,而没有功能代码的漏洞状态。我们已经进行了广泛的实验,以大规模收集从多个现实世界项目中获得的现实世界中的C ++漏洞,并证明,F1得分的增长比当前的最新时间表上升了105 \%。

Current machine-learning based software vulnerability detection methods are primarily conducted at the function-level. However, a key limitation of these methods is that they do not indicate the specific lines of code contributing to vulnerabilities. This limits the ability of developers to efficiently inspect and interpret the predictions from a learnt model, which is crucial for integrating machine-learning based tools into the software development workflow. Graph-based models have shown promising performance in function-level vulnerability detection, but their capability for statement-level vulnerability detection has not been extensively explored. While interpreting function-level predictions through explainable AI is one promising direction, we herein consider the statement-level software vulnerability detection task from a fully supervised learning perspective. We propose a novel deep learning framework, LineVD, which formulates statement-level vulnerability detection as a node classification task. LineVD leverages control and data dependencies between statements using graph neural networks, and a transformer-based model to encode the raw source code tokens. In particular, by addressing the conflicting outputs between function-level and statement-level information, LineVD significantly improve the prediction performance without vulnerability status for function code. We have conducted extensive experiments against a large-scale collection of real-world C/C++ vulnerabilities obtained from multiple real-world projects, and demonstrate an increase of 105\% in F1-score over the current state-of-the-art.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源