论文标题

基于聚类的自动构建合同的法律实体知识库

Clustering-based Automatic Construction of Legal Entity Knowledge Base from Contracts

论文作者

Song, Fuqi, de la Clergerie, Éric

论文摘要

在合同分析和合同自动化中,法律实体的知识库(KB)对于执行合同验证,合同生成和合同分析等任务至关重要。但是,这样的KB并不总是存在,也不能在短时间内产生。在本文中,我们提出了一种基于聚类的方法,以自动从给定合同中的法律实体生成可靠的知识库,而无需任何补充参考。提出的方法对通过预处理(例如光学特征识别(OCR)和命名实体识别(NER))以及编辑错误(例如错别字)带来的不同类型的错误具有鲁棒性。我们在数据集上评估我们的方法,该数据集由800个实际合同组成,具有15个客户的各种素质。与收集的基地数据相比,我们的方法能够回顾84%的知识。

In contract analysis and contract automation, a knowledge base (KB) of legal entities is fundamental for performing tasks such as contract verification, contract generation and contract analytic. However, such a KB does not always exist nor can be produced in a short time. In this paper, we propose a clustering-based approach to automatically generate a reliable knowledge base of legal entities from given contracts without any supplemental references. The proposed method is robust to different types of errors brought by pre-processing such as Optical Character Recognition (OCR) and Named Entity Recognition (NER), as well as editing errors such as typos. We evaluate our method on a dataset that consists of 800 real contracts with various qualities from 15 clients. Compared to the collected ground-truth data, our method is able to recall 84\% of the knowledge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源