论文标题
在集成检索系统中研究出版物和数据集的可检索性
Studying Retrievability of Publications and Datasets in an Integrated Retrieval System
论文作者
论文摘要
在本文中,我们研究了现实生活中的数字图书馆(DL)中数据集和出版物的可检索性。最初开发了可检索性的度量,以量化检索系统对信息访问的影响。可检索性还可以使DL工程师能够评估其搜索引擎,以确定访问集合中内容的易于性。遵循这种方法,在我们的研究中,我们提出了一种以系统为导向的方法来研究数据集和出版物检索。本文的专长是重点是衡量各种DL项目的可访问性偏见,并包括有用性指标。除其他指标外,我们还使用Lorenz曲线和Gini系数来可视化两种可检索文档类型(特别是数据集和出版物)的差异。本文中报道的经验结果表明,不同类型的文档之间可检索性得分的多样性。
In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to information. Retrievability can also enable DL engineers to evaluate their search engine to determine the ease with which the content in the collection can be accessed. Following this methodology, in our study, we propose a system-oriented approach for studying dataset and publication retrieval. A speciality of this paper is the focus on measuring the accessibility biases of various types of DL items and including a metric of usefulness. Among other metrics, we use Lorenz curves and Gini coefficients to visualize the differences of the two retrievable document types (specifically datasets and publications). Empirical results reported in the paper show a distinguishable diversity in the retrievability scores among the documents of different types.