论文标题

使用DBLP Discovery数据集分析计算机科学研究的状态

Analyzing the State of Computer Science Research with the DBLP Discovery Dataset

论文作者

Küll, Lennart

论文摘要

科学出版物的数量继续成倍增加,尤其是计算机科学(CS)。但是,当前分析这些出版物的解决方案限制了付费墙后面的访问,不提供视觉分析的功能,限制对数据的访问,只关注壁nir或子场以及/或没有灵活和模块化,以至于将其传输到其他数据集。 在本论文中,我们进行了科学分析,以揭示CS元数据中隐藏的隐式模式并确定CS研究的状态。具体而言,我们研究了作者,场地,文档类型(会议与期刊)和研究领域的数量,影响和主题的趋势(与医学相比)。为了实现这一目标,我们介绍了CS-Indights System,这是一种交互式Web应用程序,可通过各种仪表板,过滤器和可视化分析CS出版物。该系统的基础数据是DBLP发现数据集(D3),其中包含来自500万CS出版物的元数据。 D3和CS-Insight都是开放式访问,将来可以轻松地适应其他数据集。 The most interesting findings of our scientometric analysis include that i) there has been a stark increase in publications, authors, and venues in the last two decades, ii) many authors only recently joined the field, iii) the most cited authors and venues focus on computer vision and pattern recognition, while the most productive prefer engineering-related topics, iv) the preference of researchers to publish in conferences over journals dwindles, v) on average, journal articles与会议论文相比,获得的引用量是两倍,但是对最引用的会议和期刊的对比度要小得多,而VI)期刊在所有其他研究的研究领域中也获得了更多的引用,而仅CS和工程学在会议上比期刊上发表更多的引用。

The number of scientific publications continues to rise exponentially, especially in Computer Science (CS). However, current solutions to analyze those publications restrict access behind a paywall, offer no features for visual analysis, limit access to their data, only focus on niches or sub-fields, and/or are not flexible and modular enough to be transferred to other datasets. In this thesis, we conduct a scientometric analysis to uncover the implicit patterns hidden in CS metadata and to determine the state of CS research. Specifically, we investigate trends of the quantity, impact, and topics for authors, venues, document types (conferences vs. journals), and fields of study (compared to, e.g., medicine). To achieve this we introduce the CS-Insights system, an interactive web application to analyze CS publications with various dashboards, filters, and visualizations. The data underlying this system is the DBLP Discovery Dataset (D3), which contains metadata from 5 million CS publications. Both D3 and CS-Insights are open-access, and CS-Insights can be easily adapted to other datasets in the future. The most interesting findings of our scientometric analysis include that i) there has been a stark increase in publications, authors, and venues in the last two decades, ii) many authors only recently joined the field, iii) the most cited authors and venues focus on computer vision and pattern recognition, while the most productive prefer engineering-related topics, iv) the preference of researchers to publish in conferences over journals dwindles, v) on average, journal articles receive twice as many citations compared to conference papers, but the contrast is much smaller for the most cited conferences and journals, and vi) journals also get more citations in all other investigated fields of study, while only CS and engineering publish more in conferences than journals.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源