论文标题
PR-SZZ:拉力请求如何支持软件存储库中缺陷的追踪
PR-SZZ: How pull requests can support the tracing of defects in software repositories
论文作者
论文摘要
SZZ算法代表了识别错误修复提交以及诱导对应物的标准方法。它构成了许多实证研究中使用的数据集的基础。自创建以来,已经提出了多次扩展以提高其性能。由于历史原因,相关工作依靠提交消息来映射错误票证到可能相关的代码,而没有其他数据来追踪这些修复程序的诱导提交。因此,我们提出了使用拉的请求的更新版本的SZZ,该版本今天广泛采用。与现有的SZZ变体相比,我们通过进行实验并分析拉动请求,内部提交和合并策略的使用来评估我们的方法。我们将结果基于6个开源项目,其中有超过50k的提交和35K拉的请求。关于错误修复提交的提交,平均可以将18%的错误机票映射到固定提交中,从而导致F-评分为0.75,提高了40个百分点。通过选择诱导的提交,我们设法将假阳性材料降低,并将精度平均增加16个百分点,而不是现有方法。
The SZZ algorithm represents a standard way to identify bug fixing commits as well as inducing counterparts. It forms the basis for data sets used in numerous empirical studies. Since its creation, multiple extensions have been proposed to enhance its performance. For historical reasons, related work relies on commit messages to map bug tickets to possibly related code with no additional data used to trace inducing commits from these fixes. Therefore, we present an updated version of SZZ utilizing pull requests, which are widely adopted today. We evaluate our approach in comparison to existing SZZ variants by conducting experiments and analyzing the usage of pull requests, inner commits, and merge strategies. We base our results on 6 open-source projects with more than 50k commits and 35k pull requests. With respect to bug fixing commits, on average 18% of bug tickets can be additionally mapped to a fixing commit, resulting in an overall F-score of 0.75, an improvement of 40 percentage points. By selecting an inducing commit, we manage to reduce the false-positives and increase precision by on average 16 percentage points in comparison to existing approaches.