论文标题

您需要的只是日志:通过从匿名IDE使用日志中学习来改善代码完成

All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs

论文作者

Bibaev, Vitaliy, Kalina, Alexey, Lomshakov, Vadim, Golubev, Yaroslav, Bezzubov, Alexander, Povarov, Nikita, Bryksin, Timofey

论文摘要

在这项工作中,我们提出了一种从IDE中收集用户的完成使用日志的方法,并使用它们来训练基于机器学习的模型来对候选人进行排名。我们开发了一组描述完成候选者及其上下文的功能,并在基于Intellij的IDE的早期访问程序中部署了其匿名集合。我们使用日志从用户那里收集代码完成数据集,并使用它来训练排名catboost模型。然后,我们在两个设置中对其进行了评估:在收集到的完成的一组持有的集合中,并在IDE中的两个不同组的用户对单独的A/B测试中进行了评估。我们的评估表明,使用对过去用户行为日志训练的简单排名模型可显着改善代码完成体验。与默认的基于启发式的排名相比,我们的模型表明,在2.073上执行完成IDE所需的打字动作数量减少到1.832。 该方法遵守隐私要求和法律约束,因为它不需要收集个人信息,在客户方面执行所有必要的匿名化。重要的是,它可以连续改进:实施新功能,收集新数据并评估新模型 - 这样,我们自2020年底以来就一直在生产中使用它。

In this work, we propose an approach for collecting completion usage logs from the users in an IDE and using them to train a machine learning based model for ranking completion candidates. We developed a set of features that describe completion candidates and their context, and deployed their anonymized collection in the Early Access Program of IntelliJ-based IDEs. We used the logs to collect a dataset of code completions from users, and employed it to train a ranking CatBoost model. Then, we evaluated it in two settings: on a held-out set of the collected completions and in a separate A/B test on two different groups of users in the IDE. Our evaluation shows that using a simple ranking model trained on the past user behavior logs significantly improved code completion experience. Compared to the default heuristics-based ranking, our model demonstrated a decrease in the number of typing actions necessary to perform the completion in the IDE from 2.073 to 1.832. The approach adheres to privacy requirements and legal constraints, since it does not require collecting personal information, performing all the necessary anonymization on the client's side. Importantly, it can be improved continuously: implementing new features, collecting new data, and evaluating new models - this way, we have been using it in production since the end of 2020.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源