台式形式：对变压器的表结构理解

论文标题

台式形式：对变压器的表结构理解

TableFormer: Table Structure Understanding with Transformers

论文作者

Nassar, Ahmed, Livathinos, Nikolaos, Lysak, Maksym, Staar, Peter

论文摘要

表在简洁明了的表示中组织有价值的内容。对于搜索引擎，知识图等系统，这些内容非常有价值，因为它们增强了预测功能。不幸的是，桌子有多种形状和尺寸。此外，它们可以具有复杂的列/排名配置，多行行，不同的分离线，缺失的条目等。因此，从图像中对表结构的正确识别是一个非平凡的任务。在本文中，我们提出了一个新的表结构标识模型。后者通过两种重要方式改善了最新的端到端深度学习模型（即来自PubTabnet的编码器二次编码器）。首先，我们引入了一个新的对象检测解码器，用于表格。通过这种方式，我们可以直接从PDF源中从程序化PDF中获取表格的内容，并避免训练自定义OCR解码器。这种建筑变化会导致更准确的桌面提取，并使我们能够应对非英语桌子。其次，我们用基于变压器的解码器替换LSTM解码器。此升级显着提高了先前的最先进的树木距离得分（TEDS）从简单桌子上的91％提高到98.5％，而复杂桌子上则从88.7％提高到95％。

Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they enhance their predictive capabilities. Unfortunately, tables come in a large variety of shapes and sizes. Furthermore, they can have complex column/row-header configurations, multiline rows, different variety of separation lines, missing entries, etc. As such, the correct identification of the table-structure from an image is a non-trivial task. In this paper, we present a new table-structure identification model. The latter improves the latest end-to-end deep learning model (i.e. encoder-dual-decoder from PubTabNet) in two significant ways. First, we introduce a new object detection decoder for table-cells. In this way, we can obtain the content of the table-cells from programmatic PDF's directly from the PDF source and avoid the training of the custom OCR decoders. This architectural change leads to more accurate table-content extraction and allows us to tackle non-english tables. Second, we replace the LSTM decoders with transformer based decoders. This upgrade improves significantly the previous state-of-the-art tree-editing-distance-score (TEDS) from 91% to 98.5% on simple tables and from 88.7% to 95% on complex tables.

下载PDF全文

下载文献需遵守相关版权规定

论文标题