KL-SATD的患病率，内容和自动检测

论文标题

KL-SATD的患病率，内容和自动检测

Prevalence, Contents and Automatic Detection of KL-SATD

论文作者

Rantala, Leevi, Mäntylä, Mika, Lo, David

论文摘要

当开发人员在源代码注释中使用不同的关键字，例如TODO和FIXME来描述自我吸附的技术债务（SATD）时，我们将其称为关键字标记的SATD（KL-SATD）。我们研究了来自33个软件存储库的KL-SATD，其中包含13,588 kl-SATD评论。我们发现，在所有评论中，KL-SATD评论的中位数仅为1,52％。我们发现KL-SATD评论内容包括表达代码更改和不确定性的单词，例如删除，修复，也许甚至可能。与其他评论相比，这使它们与众不同。 KL-SATD评论内容类似于手动标记的先前工作的SATD评论。我们使用Logistic Lasso回归的机器学习分类器在检测KL-SATD注释（AUC-ROC 0.88）方面具有良好的性能。最后，我们证明，使用机器学习我们可以识别当前缺少但应该具有SATD关键字的评论。自动化缺乏SATD关键字的评论的SATD识别可以通过替换评论的手动识别来节省时间和精力。使用KL-SATD提供了引导完整的SATD检测器的潜力。

When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in them. Automating SATD identification of comments that lack SATD keywords can save time and effort by replacing manual identification of comments. Using KL-SATD offers a potential to bootstrap a complete SATD detector.

下载PDF全文

下载文献需遵守相关版权规定

论文标题