论文标题

迅速调整的代码语言模型作为静态类型部分代码类型推理的神经知识基础

Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code

论文作者

Huang, Qing, Yuan, Zhiqiang, Xing, Zhenchang, Xu, Xiwei, Zhu, Liming, Lu, Qinghua

论文摘要

部分代码通常涉及非合格的类型名称(非FQN)和未申报的接收对象。解决这些非FQN类型的FQN和未确定的接收对象(称为类型推理)是有效搜索和重用部分代码的先决条件。现有的基于字典的方法建立了API名称和代码上下文的符号知识基础,涉及大量汇编开销,并且对看不见的API名称和代码上下文变化很敏感。在本文中,我们将类型推理作为固定式填充语言任务。基于源代码自然性,我们的方法微调代码掩盖了语言模型(MLM),作为代码元素的神经知识基础,其新颖的“预训练,及时并预测”原始源代码的范式。我们的方法是轻巧的,对代码编译有最小要求。与类型推理的现有符号名称和上下文匹配不同,我们的及时调整的代码MLM包装fqn语法和用法在其参数中使用,并支持模糊的神经类型推断。我们从GitHub和堆栈溢出中系统地评估了大量源代码的方法。我们的结果证实了我们的方法设计的有效性以及部分代码类型推理的实用性。作为同类产品中的第一个,我们的神经类型推理方法为使用部分代码的许多创新方法打开了大门。

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search and reuse of partial code. Existing dictionary-lookup based methods build a symbolic knowledge base of API names and code contexts, which involve significant compilation overhead and are sensitive to unseen API names and code context variations. In this paper, we formulate type inference as a cloze-style fill-in-blank language task. Built on source code naturalness, our approach fine-tunes a code masked language model (MLM) as a neural knowledge base of code elements with a novel "pre-train, prompt and predict" paradigm from raw source code. Our approach is lightweight and has minimum requirements on code compilation. Unlike existing symbolic name and context matching for type inference, our prompt-tuned code MLM packs FQN syntax and usage in its parameters and supports fuzzy neural type inference. We systematically evaluate our approach on a large amount of source code from GitHub and Stack Overflow. Our results confirm the effectiveness of our approach design and the practicality for partial code type inference. As the first of its kind, our neural type inference method opens the door to many innovative ways of using partial code.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源