论文标题
将上下文添加到源代码表示深度学习中
Adding Context to Source Code Representations for Deep Learning
论文作者
论文摘要
深度学习模型已成功应用于各种软件工程任务,例如代码分类,摘要以及错误和漏洞检测。为了将深度学习应用于这些任务,需要以适合输入深度学习模型的格式表示源代码。代表源代码的大多数方法,例如令牌,抽象语法树(ASTS),数据流程图(DFGS)和控制流程图(CFGS)仅关注代码本身,并且不考虑可能对深度学习模型有用的其他上下文。在本文中,我们认为深度学习模型可以访问有关正在分析的代码的其他上下文信息是有益的。我们提供了初步证据,即从呼叫层次结构中编码上下文以及代码本身的信息可以改善针对两个软件工程任务的最先进的深度学习模型的性能。我们概述了我们的研究议程,以在源代码表示中添加进一步的上下文信息以进行深度学习。
Deep learning models have been successfully applied to a variety of software engineering tasks, such as code classification, summarisation, and bug and vulnerability detection. In order to apply deep learning to these tasks, source code needs to be represented in a format that is suitable for input into the deep learning model. Most approaches to representing source code, such as tokens, abstract syntax trees (ASTs), data flow graphs (DFGs), and control flow graphs (CFGs) only focus on the code itself and do not take into account additional context that could be useful for deep learning models. In this paper, we argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed. We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model for two software engineering tasks. We outline our research agenda for adding further contextual information to source code representations for deep learning.