论文标题

角色级数据增强对历史手稿的基于样式的日期的影响

The Effects of Character-Level Data Augmentation on Style-Based Dating of Historical Manuscripts

论文作者

Koopmans, Lisa, Dhali, Maruf A., Schomaker, Lambert

论文摘要

确定历史手稿的生产日期是研究古代文档时的主要目标之一。自动化方法可以为古迹提供客观工具,以更准确地估计日期。以前,统计特征已根据以下假设来约会数字化历史手稿,即手写样式在时期内发生了变化。但是,此类文档的稀疏可用性在获得健壮的系统方面构成了挑战。因此,本文的研究探讨了数据增强对历史手稿日期的影响。线性支撑向量机对基于质地和石墨素的特征进行了k折的交叉验证,这些特征是从不同收藏的历史手稿中提取的,包括中世纪的古代量表,早期的阿拉姆语手稿和死海卷轴。结果表明,具有增强数据的培训模型将历史手稿的表现提高了1% - 3%的累积分数。此外,这表明通过考虑特定于功能和文档脚本的模型来表明进一步的增强可能性。

Identifying the production dates of historical manuscripts is one of the main goals for paleographers when studying ancient documents. Automatized methods can provide paleographers with objective tools to estimate dates more accurately. Previously, statistical features have been used to date digitized historical manuscripts based on the hypothesis that handwriting styles change over periods. However, the sparse availability of such documents poses a challenge in obtaining robust systems. Hence, the research of this article explores the influence of data augmentation on the dating of historical manuscripts. Linear Support Vector Machines were trained with k-fold cross-validation on textural and grapheme-based features extracted from historical manuscripts of different collections, including the Medieval Paleographical Scale, early Aramaic manuscripts, and the Dead Sea Scrolls. Results show that training models with augmented data improve the performance of historical manuscripts dating by 1% - 3% in cumulative scores. Additionally, this indicates further enhancement possibilities by considering models specific to the features and the documents' scripts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源