论文标题
转录机器学习中世纪手稿
Transcribing Medieval Manuscripts for Machine Learning
论文作者
论文摘要
本文重点介绍了中世纪手稿的转录。尽管转录问题长期以来一直具有感兴趣的中世纪主义者,但除标准化外,印刷版本时期几乎没有可行的选择。这个过程的自动化(称为手写文本识别(HTR))使新型的数字文本创建成为可能,但也提出了我们学术实践中理论转录的必要性。我们在此处反映在不同的文本技术背景下的不同转录概念。此外,我们借鉴了我们对中世纪拉丁圣经的研究,我们提出了定制转录方案的一般指南,认为它们必须是针对特定的研究问题和学术最终用途设计的。由于我们对抄写员对抄本的贡献特别感兴趣,因此我们的转录准则旨在捕捉到下游机器学习任务的不同文本见证人之间的缩写和拼写变化。在本文的最后一部分中,我们讨论了一些示例,介绍了HTR创建的转录如何使我们能够在中世纪手稿中按大规模解决新问题,例如跨证人的文本差异,预测单手稿中的抄写手的变化以及个人和法规抄写特征的分析。
This article focuses on the transcription of medieval manuscripts. Whereas problems of transcription have long interested medievalists, few workable options in the era of printed editions were available besides normalisation. The automation of this process, known as handwritten text recognition (HTR), has made new kinds of digital text creation possible, but also has foregrounded the necessity of theorising transcription in our scholarly practices. We reflect here on different notions of transcription against the backdrop of changing text technologies. Moreover, drawing on our own research on medieval Latin Bibles, we present general guidelines for customizing transcription schemes, arguing that they must be designed with specific research questions and scholarly end use in mind. Since we are particularly interested in the scribal contribution to the production of codices, our transcription guidelines aim to capture abbreviations and orthographic variation between different textual witnesses for downstream machine learning tasks. In the final section of the article, we discuss a few examples of how the HTR-created transcriptions allow us to address new questions at scale in medieval manuscripts, such as textual variance across witnesses, the prediction of a change in scribal hands within a single manuscript as well as the profiling of individual and regional scribal characteristics.