论文标题
弃用的机器学习API使用的表征和自动更新
Characterization and Automatic Update of Deprecated Machine-Learning API Usages
论文作者
论文摘要
由于AI应用程序的兴起,机器学习库变得更加易于访问,Python是最常见的编程语言。机器学习库往往会定期更新,这可能会贬低现有的API,因此开发人员必须更新其用法。但是,对弃用API的更新用法通常不是开发人员的优先级,从而导致弃用API的广泛使用,从而使图书馆用户暴露于漏洞问题。在本文中,我们构建了一个工具来自动化这些更新。我们首先进行了一项经验研究,以寻求更好地了解如何完成Python中折衷的机器学习API使用情况的更新。该研究涉及来自Scikit-Learn,Tensorflow和Pytorch的112个弃用API的数据集。我们发现,不推荐使用的API迁移的尺寸与其更新操作有关(即执行迁移的所需操作),API映射(即,弃用的数量及其相应的更新API)和上下文依赖关系(即,在执行迁移时是否需要考虑周围的上下文)。在实证研究的发现的指导下,我们创建了MLCATCHUP,该工具是自动化Python弃用API使用情况的更新,该工具会通过比较弃用和更新的API签名来自动侵入API迁移转换。这些转换以特定于域的语言(DSL)表示。我们使用包含258个文件的测试数据集评估了MLCATCHUP,其中包含514个API用法,我们从公共GitHub存储库中收集了514个使用情况。在此评估中,MLCATCHUP的精度为86.19%。我们通过添加一个功能来进一步提高MLCATCHUP的精度,该功能允许其接受其他用户输入以指定DSL中的转换约束,以依赖上下文依赖于上下文的API迁移,其中MLCATCHUP的精度为93.58%。
Due to the rise of AI applications, machine learning libraries have become far more accessible, with Python being the most common programming language to write them. Machine learning libraries tend to be updated periodically, which may deprecate existing APIs, making it necessary for developers to update their usages. However, updating usages of deprecated APIs are typically not a priority for developers, leading to widespread usages of deprecated APIs which expose library users to vulnerability issues. In this paper, we built a tool to automate these updates. We first conducted an empirical study to seek a better understanding on how updates of deprecated machine-learning API usages in Python can be done. The study involved a dataset of 112 deprecated APIs from Scikit-Learn, TensorFlow, and PyTorch. We found dimensions of deprecated API migration related to its update operation (i.e., the required operation to perform the migration), API mapping (i.e., the number of deprecated and its corresponding updated APIs),and context dependency (i.e., whether we need to consider surrounding contexts when performing the migration). Guided by the findings on our empirical study, we created MLCatchUp, a tool to automate the update of Python deprecated API usage that automatically infers the API migration transformation through comparison of the deprecated and updated API signatures. These transformations are expressed in a Domain Specific Language (DSL). We evaluated MLCatchUp using test dataset containing 258 files with 514 API usages that we collected from public GitHub repositories. In this evaluation, MLCatchUp achieves a precision of 86.19%. We further improve the precision of MLCatchUp by adding a feature that allows it to accept additional user input to specify the transformation constraints in the DSL for context-dependent API migration, where MLCatchUp achieves a precision of 93.58%.