代码生成模型的多语言评估

论文标题

代码生成模型的多语言评估

Multi-lingual Evaluation of Code Generation Models

论文作者

Athiwaratkun, Ben, Gouda, Sanjay Krishna, Wang, Zijian, Li, Xiaopeng, Tian, Yuchen, Tan, Ming, Ahmad, Wasi Uddin, Wang, Shiqi, Sun, Qing, Shang, Mingyue, Gonugondla, Sujan Kumar, Ding, Hantian, Kumar, Varun, Fulton, Nathan, Farahani, Arash, Jain, Siddhartha, Giaquinto, Robert, Qian, Haifeng, Ramanathan, Murali Krishna, Nallapati, Ramesh, Ray, Baishakhi, Bhatia, Parminder, Sengupta, Sudipta, Roth, Dan, Xiang, Bing

论文摘要

我们在评估代码生成模型上介绍了新的基准：MBXP和多语言HumaneVal和Mathqa-X。这些数据集涵盖了10种编程语言，并使用可扩展的转换框架生成，该框架将原始Python数据集的提示和测试用例转移到目标语言中的相应数据中。使用这些基准测试，我们能够以多种语言方式评估代码生成模型的性能，并发现了语言模型在室外语言上的概括能力，多语言模型的优势比单语语言相对于单语语言，促使几乎没有射击的能力促使模型新语言以及在单声道上甚至可以在单声道上进行零击功能。此外，我们使用代码生成模型执行大规模的引导程序，以几种语言获得合成规范解决方案，这些解决方案可用于其他与代码相关的评估，例如代码插入，稳健性或摘要任务。总体而言，我们的基准是朝着更深入了解语言模型代码生成能力的重要一步。我们在https://github.com/amazon-research/mxeval公开发布代码和数据集。

We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are generated using a scalable conversion framework that transpiles prompts and test cases from the original Python datasets into the corresponding data in the target language. Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings. Furthermore, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages, which can be used for other code-related evaluations such as code insertion, robustness, or summarization tasks. Overall, our benchmarks represents a significant step towards a deeper understanding of language models' code generation abilities. We publicly release our code and datasets at https://github.com/amazon-research/mxeval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题