论文标题
克拉特:为爱沙尼亚国家图书馆开发自动主题索引工具
Kratt: Developing an Automatic Subject Indexing Tool for The National Library of Estonia
论文作者
论文摘要
在库中索引的手动主题是一个耗时且昂贵的过程,分配主题的质量受到目录中包含的特定主题的知识的影响。试图解决这些问题,我们利用了人工智能开发Kratt产生的机会:自动主题索引工具的原型。克拉特(Kratt)能够将一本独立于其范围和流派的书进行索引,并在爱沙尼亚主题中存在一组关键字。克拉特大约需要1分钟才能索引一本书,超过人类10-15次。尽管所产生的关键字不被目录者认为令人满意,但是一小部分常规图书馆用户的评分表现出了更多的希望。我们还认为,可以通过包括更大的训练模型并应用更仔细的预处理技术来增强结果。
Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately 1 minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the cataloguers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.