Advaita：错误重复检测系统

论文标题

Advaita：错误重复检测系统

Advaita: Bug Duplicity Detection System

论文作者

Kumar, Amit, Madanu, Manohar, Prakash, Hari, Jonnavithula, Lalitha, Aravilli, Srinivasa Rao

论文摘要

错误在软件开发中很普遍。为了提高软件质量，使用错误跟踪系统提交错误。报告的错误的属性将由标题，描述，项目，产品，组件组成，该组件受错误和错误的严重性影响。重复的错误率（重复错误的百分比）根据产品成熟度，代码的大小和从事该项目的工程师的数量，从单位数（1到9％）到两位数（40％）的范围。在Eclipse，Firefox等一些开源项目中，重复的错误范围在9％至39％之间。检测重复的处理，以确定任何两个错误是否传达了相同的含义。对重复项的这种检测有助于删除。检测重复的错误有助于减少分类工作，并为开发人员节省解决问题的时间。传统的自然语言处理技术在识别句子之间的相似性方面不太准确。使用错误跟踪系统中存在的错误数据，探索了各种方法，包括几种机器学习算法，以获取可以识别重复错误的可行方法，给定一对句子（即相应的错误描述）。这种方法考虑了多组功能。基本文本统计特征，语义特征和上下文特征。这些功能是从标题，描述和组件中提取的，随后用于训练分类算法。

Bugs are prevalent in software development. To improve software quality, bugs are filed using a bug tracking system. Properties of a reported bug would consist of a headline, description, project, product, component that is affected by the bug and the severity of the bug. Duplicate bugs rate (% of duplicate bugs) are in the range from single digit (1 to 9%) to double digits (40%) based on the product maturity , size of the code and number of engineers working on the project. Duplicate bugs range are between 9% to 39% in some of the open source projects like Eclipse, Firefox etc. Detection of duplicity deals with identifying whether any two bugs convey the same meaning. This detection of duplicates helps in de-duplication. Detecting duplicate bugs help reduce triaging efforts and saves time for developers in fixing the issues. Traditional natural language processing techniques are less accurate in identifying similarity between sentences. Using the bug data present in a bug tracking system, various approaches were explored including several machine learning algorithms, to obtain a viable approach that can identify duplicate bugs, given a pair of sentences(i.e. the respective bug descriptions). This approach considers multiple sets of features viz. basic text statistical features, semantic features and contextual features. These features are extracted from the headline, description and component and are subsequently used to train a classification algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题