变压器模型可以有效地检测Stackoverflow讨论中的软件方面吗？

论文标题

变压器模型可以有效地检测Stackoverflow讨论中的软件方面吗？

Can Transformer Models Effectively Detect Software Aspects in StackOverflow Discussion?

论文作者

Mandal, Nibir Chandra, Muhammad, Tashreef, Shahariar, G. M.

论文摘要

正在纳入数十种新工具和技术，以帮助开发人员，这在他们努力选择一种而不是其他一种时，他们正成为震惊的根源。例如，开发人员至少有十个框架可用于开发Web应用程序，并在选择满足其需求的最佳框架时提出了一个难题。结果，开发人员正在不断搜索每个API，框架，工具等的所有好处和缺点。典型的方法之一是通过官方文档和讨论来检查所有功能。这种方法是耗时的，通常使得难以确定哪些方面对特定开发人员最重要，以及特定方面对整个社区是否重要。在本文中，我们使用了从Stackoverflow帖子收集的基准API方面数据集（意见），并观察了Transformer模型（Bert，Roberta，Distilbert和XLNet）在检测有关基线支持矢量（SVM Machine（SVM）模型的文本开发人员讨论中的软件方面时，如何进行。通过广泛的实验，我们发现变压器模型改善了大多数方面的基线SVM的性能，即``绩效''，``安全性''，``可用性''，``可用性''，``bug''，``bug''，'lage Legal'，`''OnlySentiment'''和``其他'''。但是，这些模型未能理解某些方面（例如，“社区”和“陶器”），其性能因方面而异。同样，与Distilbert这样的较小体系结构相比，XLNET等较大的体系结构在解释软件方面无效。

Dozens of new tools and technologies are being incorporated to help developers, which is becoming a source of consternation as they struggle to choose one over the others. For example, there are at least ten frameworks available to developers for developing web applications, posing a conundrum in selecting the best one that meets their needs. As a result, developers are continuously searching for all of the benefits and drawbacks of each API, framework, tool, and so on. One of the typical approaches is to examine all of the features through official documentation and discussion. This approach is time-consuming, often makes it difficult to determine which aspects are the most important to a particular developer and whether a particular aspect is important to the community at large. In this paper, we have used a benchmark API aspects dataset (Opiner) collected from StackOverflow posts and observed how Transformer models (BERT, RoBERTa, DistilBERT, and XLNet) perform in detecting software aspects in textual developer discussion with respect to the baseline Support Vector Machine (SVM) model. Through extensive experimentation, we have found that transformer models improve the performance of baseline SVM for most of the aspects, i.e., `Performance', `Security', `Usability', `Documentation', `Bug', `Legal', `OnlySentiment', and `Others'. However, the models fail to apprehend some of the aspects (e.g., `Community' and `Potability') and their performance varies depending on the aspects. Also, larger architectures like XLNet are ineffective in interpreting software aspects compared to smaller architectures like DistilBERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题