论文标题

在集成检索系统中研究出版物和数据集的可检索性

Studying Retrievability of Publications and Datasets in an Integrated Retrieval System

论文作者

Roy, Dwaipayan, Carevic, Zeljko, Mayr, Philipp

论文摘要

在本文中,我们研究了现实生活中的数字图书馆(DL)中数据集和出版物的可检索性。最初开发了可检索性的度量,以量化检索系统对信息访问的影响。可检索性还可以使DL工程师能够评估其搜索引擎,以确定访问集合中内容的易于性。遵循这种方法,在我们的研究中,我们提出了一种以系统为导向的方法来研究数据集和出版物检索。本文的专长是重点是衡量各种DL项目的可访问性偏见,并包括有用性指标。除其他指标外,我们还使用Lorenz曲线和Gini系数来可视化两种可检索文档类型(特别是数据集和出版物)的差异。本文中报道的经验结果表明,不同类型的文档之间可检索性得分的多样性。

In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to information. Retrievability can also enable DL engineers to evaluate their search engine to determine the ease with which the content in the collection can be accessed. Following this methodology, in our study, we propose a system-oriented approach for studying dataset and publication retrieval. A speciality of this paper is the focus on measuring the accessibility biases of various types of DL items and including a metric of usefulness. Among other metrics, we use Lorenz curves and Gini coefficients to visualize the differences of the two retrievable document types (specifically datasets and publications). Empirical results reported in the paper show a distinguishable diversity in the retrievability scores among the documents of different types.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源