竞争性编程AI的经验评估：字母研究的案例研究

论文标题

竞争性编程AI的经验评估：字母研究的案例研究

An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode

论文作者

Lertbanjongngam, Sila, Chinthanet, Bodin, Ishio, Takashi, Kula, Raula Gaikovina, Leelaprute, Pattara, Manaskasemsak, Bundit, Rungsawang, Arnon, Matsumoto, Kenichi

论文摘要

AlphaCode是一种代码生成系统，用于帮助软件开发人员使用自然语言问题描述解决竞争性编程问题。尽管代码生成系统具有优势，但开源社区对实用性和数据许可表示担忧。但是，没有研究根据代码克隆和性能来调查生成的代码。在本文中，我们进行了一项实证研究，以找到字母生成的代码和人类代码之间的代码相似性和性能差异。结果表明，（i）字母的生成代码类似于人类代码（即，在执行时间和内存使用情况下，生成的代码的平均最大相似性得分为0.56）和（ii）与人类代码相比或比人类代码更糟。此外，字母倾向于生成与人类有关低难题问题的更相似的代码（即四种情况具有完全相同的代码）。它还针对高难题的问题采用过多的嵌套环和不必要的变量声明，这在我们的手动调查中会导致绩效较低。复制软件包可从https：/doi.org/10.5281/zenodo.6820681获得

AlphaCode is a code generation system for assisting software developers in solving competitive programming problems using natural language problem descriptions. Despite the advantages of the code generating system, the open source community expressed concerns about practicality and data licensing. However, there is no research investigating generated codes in terms of code clone and performance. In this paper, we conduct an empirical study to find code similarities and performance differences between AlphaCode-generated codes and human codes. The results show that (i) the generated codes from AlphaCode are similar to human codes (i.e., the average maximum similarity score is 0.56) and (ii) the generated code performs on par with or worse than the human code in terms of execution time and memory usage. Moreover, AlphaCode tends to generate more similar codes to humans for low-difficulty problems (i.e., four cases have the exact same codes). It also employs excessive nested loops and unnecessary variable declarations for high-difficulty problems, which cause low performance regarding our manual investigation. The replication package is available at https:/doi.org/10.5281/zenodo.6820681

下载PDF全文

下载文献需遵守相关版权规定

论文标题