基于深度学习类型推理系统的跨域评估

论文标题

基于深度学习类型推理系统的跨域评估

Cross-Domain Evaluation of a Deep Learning-Based Type Inference System

论文作者

Gruner, Bernd, Sonnekalb, Tim, Heinze, Thomas S., Brust, Clemens-Alexander

论文摘要

可选类型的注释允许通过静态打字功能来丰富动态编程语言，例如更好的集成开发环境（IDE）支持，更精确的程序分析以及与类型相关的运行时错误的早期检测和预防。基于机器学习的类型推理有望自动执行此任务的有趣结果。但是，此类系统的实际用法取决于它们在跨不同领域概括的能力，因为它们通常在训练领域之外应用。在这项工作中，我们通过进行广泛的跨域实验来研究Type4py作为最先进的基于深度学习类型的推理系统的代表。因此，我们解决了以下问题：类不平衡，播音组外词，数据集偏移和未知类别。为了执行此类实验，我们使用数据集nytypes4py和crossdomaintypes4py。我们在本文中介绍的后者。我们的数据集可以评估软件项目不同域中的类型推理系统，并在平台上挖掘了1,000,000多种类型的注释。它由来自两个领域的网络开发和科学计算的数据组成。通过我们的实验，我们检测到数据集的变化以及许多罕见和未知数据类型的长尾分布可大大降低基于深度学习的推理系统的性能。在这种情况下，我们测试了无监督的域适应方法和微调来克服这些问题。此外，我们调查了量量表外词的影响。

Optional type annotations allow for enriching dynamic programming languages with static typing features like better Integrated Development Environment (IDE) support, more precise program analysis, and early detection and prevention of type-related runtime errors. Machine learning-based type inference promises interesting results for automating this task. However, the practical usage of such systems depends on their ability to generalize across different domains, as they are often applied outside their training domain. In this work, we investigate Type4Py as a representative of state-of-the-art deep learning-based type inference systems, by conducting extensive cross-domain experiments. Thereby, we address the following problems: class imbalances, out-of-vocabulary words, dataset shifts, and unknown classes. To perform such experiments, we use the datasets ManyTypes4Py and CrossDomainTypes4Py. The latter we introduce in this paper. Our dataset enables the evaluation of type inference systems in different domains of software projects and has over 1,000,000 type annotations mined on the platforms GitHub and Libraries. It consists of data from the two domains web development and scientific calculation. Through our experiments, we detect that the shifts in the dataset and the long-tailed distribution with many rare and unknown data types decrease the performance of the deep learning-based type inference system drastically. In this context, we test unsupervised domain adaptation methods and fine-tuning to overcome these issues. Moreover, we investigate the impact of out-of-vocabulary words.

下载PDF全文

下载文献需遵守相关版权规定

论文标题