论文标题
从法律文本中提取语义法律元数据的自动框架
An Automated Framework for the Extraction of Semantic Legal Metadata from Legal Texts
论文作者
论文摘要
语义法律元数据提供了帮助理解和解释法律规定的信息。因此,这种元数据对于法律要求的系统分析很重要。但是,用语义元数据手动增强大型法律语料库非常昂贵。我们的工作是由两个观察结果激励的:(1)现有的需求工程(RE)文献对语义元数据类型没有提供对法律要求分析有用的语义元数据类型的统一观点; (2)对开采语义法律元数据的自动支持是稀缺的,并且不会利用人工智能技术的全部潜力,尤其是自然语言处理(NLP)和机器学习(ML)。我们的目标是采取步骤克服这些限制。为此,我们审查和调和文献中提出的语义法律元数据类型。随后,我们使用NLP和ML为已识别的元数据类型设计了一种自动提取方法。我们通过对卢森堡立法的两个案例研究评估我们的方法。我们的结果表明,元数据注释的产生很高。特别是,在这两个案例研究中,我们能够获得97.2%和82.4%的精确评分,并召回94.9%和92.4%的召回评分。
Semantic legal metadata provides information that helps with understanding and interpreting legal provisions. Such metadata is therefore important for the systematic analysis of legal requirements. However, manually enhancing a large legal corpus with semantic metadata is prohibitively expensive. Our work is motivated by two observations: (1) the existing requirements engineering (RE) literature does not provide a harmonized view on the semantic metadata types that are useful for legal requirements analysis; (2) automated support for the extraction of semantic legal metadata is scarce, and it does not exploit the full potential of artificial intelligence technologies, notably natural language processing (NLP) and machine learning (ML). Our objective is to take steps toward overcoming these limitations. To do so, we review and reconcile the semantic legal metadata types proposed in the RE literature. Subsequently, we devise an automated extraction approach for the identified metadata types using NLP and ML. We evaluate our approach through two case studies over the Luxembourgish legislation. Our results indicate a high accuracy in the generation of metadata annotations. In particular, in the two case studies, we were able to obtain precision scores of 97.2% and 82.4% and recall scores of 94.9% and 92.4%.