论文标题
台式形式:对变压器的表结构理解
TableFormer: Table Structure Understanding with Transformers
论文作者
论文摘要
表在简洁明了的表示中组织有价值的内容。对于搜索引擎,知识图等系统,这些内容非常有价值,因为它们增强了预测功能。不幸的是,桌子有多种形状和尺寸。此外,它们可以具有复杂的列/排名配置,多行行,不同的分离线,缺失的条目等。因此,从图像中对表结构的正确识别是一个非平凡的任务。在本文中,我们提出了一个新的表结构标识模型。后者通过两种重要方式改善了最新的端到端深度学习模型(即来自PubTabnet的编码器二次编码器)。首先,我们引入了一个新的对象检测解码器,用于表格。通过这种方式,我们可以直接从PDF源中从程序化PDF中获取表格的内容,并避免训练自定义OCR解码器。这种建筑变化会导致更准确的桌面提取,并使我们能够应对非英语桌子。其次,我们用基于变压器的解码器替换LSTM解码器。此升级显着提高了先前的最先进的树木距离得分(TEDS)从简单桌子上的91%提高到98.5%,而复杂桌子上则从88.7%提高到95%。
Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a non-trivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-to-end deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF's directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables.