论文标题

KL-SATD的患病率,内容和自动检测

Prevalence, Contents and Automatic Detection of KL-SATD

论文作者

Rantala, Leevi, Mäntylä, Mika, Lo, David

论文摘要

当开发人员在源代码注释中使用不同的关键字,例如TODO和FIXME来描述自我吸附的技术债务(SATD)时,我们将其称为关键字标记的SATD(KL-SATD)。我们研究了来自33个软件存储库的KL-SATD,其中包含13,588 kl-SATD评论。我们发现,在所有评论中,KL-SATD评论的中位数仅为1,52%。我们发现KL-SATD评论内容包括表达代码更改和不确定性的单词,例如删除,修复,也许甚至可能。与其他评论相比,这使它们与众不同。 KL-SATD评论内容类似于手动标记的先前工作的SATD评论。我们使用Logistic Lasso回归的机器学习分类器在检测KL-SATD注释(AUC-ROC 0.88)方面具有良好的性能。最后,我们证明,使用机器学习我们可以识别当前缺少但应该具有SATD关键字的评论。自动化缺乏SATD关键字的评论的SATD识别可以通过替换评论的手动识别来节省时间和精力。使用KL-SATD提供了引导完整的SATD检测器的潜力。

When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in them. Automating SATD identification of comments that lack SATD keywords can save time and effort by replacing manual identification of comments. Using KL-SATD offers a potential to bootstrap a complete SATD detector.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源