论文标题
高度选择包装的特征是什么? NPM生态系统的案例研究
What are the characteristics of highly-selected packages? A case study on the npm ecosystem
论文作者
论文摘要
随着软件生态系统的普及,开源组件(称为软件包)的数量迅速增长。从大量包装池中识别出高质量且维护良好的包裹是一个基本和重要的问题,因为它对各种应用程序有益,例如包装建议和包装搜索。但是,除了在线讨论或非正式文献和访谈外,没有系统和全面的工作重点是解决此问题。为了填补这一空白,在本文中,我们进行了定性和定量分析,以了解开发人员如何识别和选择相关的开源包。特别是,我们首先调查了从NPM生态系统的118个JavaScript开发人员,以定性地了解将软件包在NPM生态系统中高度选择的因素。调查结果表明,JavaScript开发人员认为,备受推崇的包装有据可查,在Github上获得大量的星星,有大量下载,并且不会遭受漏洞。然后,我们进行了一个实验,以定量验证开发人员对制造高度选择包装的因素的看法。在此分析中,我们从2,527个包装中收集并挖掘了历史数据,分为高度选择的和不是高度选择的软件包。对于数据集中的每个软件包,我们收集了定量数据,以介绍开发人员调查中研究的因素。接下来,我们使用回归分析来定量研究哪些研究因素最重要。我们的回归分析补充了有关高度选择软件包的调查结果。特别是,结果表明,高度选择的软件包往往会与下载,星星的数量以及包装的读数文件的大小相关联。
With the popularity of software ecosystems, the number of open source components (known as packages) has grown rapidly. Identifying high-quality and well-maintained packages from a large pool of packages to depend on is a basic and important problem, as it is beneficial for various applications, such as package recommendation and package search. However, no systematic and comprehensive work focuses on addressing this problem except in online discussions or informal literature and interviews. To fill this gap, in this paper, we conducted a mixed qualitative and quantitative analysis to understand how developers identify and select relevant open source packages. In particular, we started by surveying 118 JavaScript developers from the npm ecosystem to qualitatively understand the factors that make a package to be highly-selected within the npm ecosystem. The survey results showed that JavaScript developers believe that highly-selected packages are well-documented, receive a high number of stars on GitHub, have a large number of downloads, and do not suffer from vulnerabilities. Then, we conducted an experiment to quantitatively validate the developers' perception of the factors that make a highly-selected package. In this analysis, we collected and mined historical data from 2,527 packages divided into highly-selected and not highly-selected packages. For each package in the dataset, we collected quantitative data to present the factors studied in the developers' survey. Next, we used regression analysis to quantitatively investigate which of the studied factors are the most important. Our regression analysis complements our survey results about highly-selected packages. In particular, the results showed that highly-selected packages tend to be correlated by the number of downloads, stars, and how large the package's readme file is.