论文标题
GAP-GEN:引导自动Python代码生成
GAP-Gen: Guided Automatic Python Code Generation
论文作者
论文摘要
在软件开发过程中,自然语言描述中的自动代码生成可能是非常有益的。在这项工作中,我们提出了基于Python句法约束和语义约束的指导自动Python代码生成方法。我们首先以语法流的形式介绍了Python句法约束,这是抽象语法树(AST)的简化版本,可降低抽象语法树的大小和高复杂性,但维持python代码的关键语法信息。除了语法流外,我们还引入了可变流,该变量流始终在整个代码中抽象和函数名称。在我们的工作中,而不是预处理中,我们专注于修改较低的计算要求,但在自动Python代码生成任务上保留了高生成性能。 GAP-GEN微调基于变压器的语言模型T5和CODET5使用CodeSearchNet,CodesearchNet advtest和Code-DocString语料库。我们的实验表明,与以前的工作相比,Gap-Gen在自动Python代码生成任务上取得更好的结果。
Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract Syntax Tree but maintaining crucial syntactic information of Python code. In addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable and function names consistently through out the code. In our work, rather than pretraining, we focus on modifying the finetuning process which reduces computational requirements but retains high generation performance on automatic Python code generation task. GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet, CodeSearchNet AdvTest and Code-Docstring Corpus from EdinburghNLP. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works.