使用上下文化代码搜索搜索源代码数据库

论文标题

使用上下文化代码搜索搜索源代码数据库

Searching a Database of Source Codes Using Contextualized Code Search

论文作者

Mukherjee, Rohan, Chaudhuri, Swarat, Jermaine, Chris

论文摘要

考虑程序员编写程序的某个部分，但使程序的一部分（例如方法或功能主体）不完整的情况。目的是使用丢失代码的上下文自动“找出”数据库中的哪个代码对程序员有用，以帮助完成丢失的代码。搜索是“上下文化”的，因为搜索引擎应使用部分完成代码中的线索来确定哪种数据库代码最有用。不应要求用户制定明确的查询。我们将上下文化的代码搜索作为学习问题，目标是学习分发函数计算每个数据库代码完成程序的可能性，并提出一个神经模型，以预测哪种数据库代码可能最有用。因为在搜索时间的数据库中，将神经模型应用于每个代码的每个代码将非常昂贵，所以我们的主要技术问题之一是确保快速搜索。我们通过学习一个“反向编码器”来解决这一问题，该“反向编码器”可用于减少评估每个数据库代码以计算两个正常分布的卷积的问题。

Consider the case where a programmer has written some part of a program, but has left part of the program (such as a method or a function body) incomplete. The goal is to use the context surrounding the missing code to automatically 'figure out' which of the codes in the database would be useful to the programmer in order to help complete the missing code. The search is 'contextualized' in the sense that the search engine should use clues in the partially-completed code to figure out which database code is most useful. The user should not be required to formulate an explicit query. We cast contextualized code search as a learning problem, where the goal is to learn a distribution function computing the likelihood that each database code completes the program, and propose a neural model for predicting which database code is likely to be most useful. Because it will be prohibitively expensive to apply a neural model to each code in a database of millions or billions of codes at search time, one of our key technical concerns is ensuring a speedy search. We address this by learning a 'reverse encoder' that can be used to reduce the problem of evaluating each database code to computing a convolution of two normal distributions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题