论文标题
Sotitle:基于变压器的堆栈溢出的帖子生成方法
SOTitle: A Transformer-based Post Title Generation Approach for Stack Overflow
论文作者
论文摘要
在堆栈溢出上,开发人员不仅可以浏览问题帖子来解决他们的编程问题,还可以从问题帖子中获得专业知识,以帮助提高他们的编程技能。因此,提高堆栈溢出中问题帖子的质量吸引了研究人员的广泛关注。简洁明了的标题可以在帮助开发人员了解问题帖子的关键信息方面发挥重要作用,从而可以提高帖子质量。但是,由于缺乏与他们的问题或开发人员的表现能力相关的专业知识,生成的标题的质量不高。先前的研究旨在通过分析问题帖子中的代码片段来自动生成标题。但是,这项研究忽略了相应的问题描述中的有用信息。因此,我们通过利用代码片段和问题帖子中的问题描述(即多模式输入)提出了一种用于自动邮政标题生成的方法。 Sotitle遵循变压器结构,该结构可以通过多头注意机制有效地捕获长期依赖性。为了验证Sotitle的有效性,我们从Stack Overflow中构建了一个大规模的高质量语料库,其中包括1,168,257种流行的编程语言的高质量问题。实验结果表明,在自动评估和人类评估中,Sotitle可以显着优于六个最先进的基线。为了鼓励后续研究,我们使我们的语料库并公开使用
On Stack Overflow, developers can not only browse question posts to solve their programming problems but also gain expertise from the question posts to help improve their programming skills. Therefore, improving the quality of question posts in Stack Overflow has attracted the wide attention of researchers. A concise and precise title can play an important role in helping developers understand the key information of the question post, which can improve the post quality. However, the quality of the generated title is not high due to the lack of professional knowledge related to their questions or the poor presentation ability of developers. A previous study aimed to automatically generate the title by analyzing the code snippets in the question post. However, this study ignored the useful information in the corresponding problem description. Therefore, we propose an approach SOTitle for automatic post title generation by leveraging the code snippets and the problem description in the question post (i.e., the multi-modal input). SOTitle follows the Transformer structure, which can effectively capture long-term dependencies through a multi-head attention mechanism. To verify the effectiveness of SOTitle, we construct a large-scale high-quality corpus from Stack Overflow, which includes 1,168,257 high-quality question posts for four popular programming languages. Experimental results show that SOTitle can significantly outperform six state-of-the-art baselines in both automatic evaluation and human evaluation. To encourage follow-up studies, we make our corpus and approach publicly available