RDU：一种基于区域的形成文档理解的方法

论文标题

RDU：一种基于区域的形成文档理解的方法

RDU: A Region-based Approach to Form-style Document Understanding

论文作者

Zhu, Fengbin, Wang, Chao, Lei, Wenqiang, Liu, Ziyang, Chua, Tat Seng

论文摘要

关键信息提取（KIE）旨在从形式式文档（例如发票）中提取结构化信息（例如，键值对），这迈出了迈向智能文档理解的重要一步。以前的方法通常通过序列标记来处理KIE，这面临处理非固定序列的困难，尤其是对于表文本混合文档而言。这些方法还遇到了预定每种文档的固定标签以及标签不平衡问题的麻烦。在这项工作中，我们假设光学特征识别（OCR）已应用于输入文档，并将KIE任务重新制定为给定目标字段的二维（2D）空间中的区域预测问题。在此新设置之后，我们开发了一个名为“基于区域的文档理解（RDU）”的新型KIE模型，该模型将文本内容和文档的相应坐标作为输入，并试图通过定位类似框架的式区域来预测结果。我们的RDU首先采用了一个布局感知的BERT，该布局配备了软布局注意掩盖和偏置机制，以将布局信息纳入表示形式。然后，通过由计算机视觉模型启发的区域提案模块从表示形式生成候选区域的列表，该模型广泛应用于对象检测。最后，采用区域分类模块和区域选择模块来判断提出的区域是否有效，并分别从所有提出的区域中选择具有最大概率的区域。对四种形式样式文档的实验表明，我们提出的方法可以取得令人印象深刻的结果。此外，我们的RDU模型可以通过不同的文档类型进行培训，这对低资源文档特别有用。

Key Information Extraction (KIE) is aimed at extracting structured information (e.g. key-value pairs) from form-style documents (e.g. invoices), which makes an important step towards intelligent document understanding. Previous approaches generally tackle KIE by sequence tagging, which faces difficulty to process non-flatten sequences, especially for table-text mixed documents. These approaches also suffer from the trouble of pre-defining a fixed set of labels for each type of documents, as well as the label imbalance issue. In this work, we assume Optical Character Recognition (OCR) has been applied to input documents, and reformulate the KIE task as a region prediction problem in the two-dimensional (2D) space given a target field. Following this new setup, we develop a new KIE model named Region-based Document Understanding (RDU) that takes as input the text content and corresponding coordinates of a document, and tries to predict the result by localizing a bounding-box-like region. Our RDU first applies a layout-aware BERT equipped with a soft layout attention masking and bias mechanism to incorporate layout information into the representations. Then, a list of candidate regions is generated from the representations via a Region Proposal Module inspired by computer vision models widely applied for object detection. Finally, a Region Categorization Module and a Region Selection Module are adopted to judge whether a proposed region is valid and select the one with the largest probability from all proposed regions respectively. Experiments on four types of form-style documents show that our proposed method can achieve impressive results. In addition, our RDU model can be trained with different document types seamlessly, which is especially helpful over low-resource documents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题