统计法律分析的数据有效端到端信息提取

论文标题

统计法律分析的数据有效端到端信息提取

Data-efficient End-to-end Information Extraction for Statistical Legal Analysis

论文作者

Hwang, Wonseok, Eom, Saehee, Lee, Hanuhl, Park, Hai Jin, Seo, Minjoon

论文摘要

法律从业者经常面临大量文件。例如，律师寻求适合其客户的适当先例，而法律先例的数量正在不断增长。尽管法律搜索引擎可以协助查找单个目标文档并缩小候选人的数量，但检索到的信息通常会显示为非结构化的文本，用户必须彻底检查每个文档，这可能导致信息过载。这也使他们的统计分析具有挑战性。在这里，我们提出了一个法律文件的端到端信息提取（IE）系统。通过将IE制定为一代任务，我们的系统可以轻松地应用于无特定领域的工程工作的情况下。与基于规则的基线相比，韩国先例的四个IE任务的实验结果表明，我们的IE系统可以达到有能力的分数（平均为-2.3），每个任务的示例少于50个训练示例，较高的分数（平均为+5.4），有200个示例。最后，我们对两个案例类别的统计分析（毁灭性驾驶和欺诈）具有35K先例，这揭示了我们IE系统中产生的结构性信息忠实地反映了韩国法律体系的宏观特征。

Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to examine each document thoroughly which could lead to information overloading. This also makes their statistical analysis challenging. Here, we present an end-to-end information extraction (IE) system for legal documents. By formulating IE as a generation task, our system can be easily applied to various tasks without domain-specific engineering effort. The experimental results of four IE tasks on Korean precedents shows that our IE system can achieve competent scores (-2.3 on average) compared to the rule-based baseline with as few as 50 training examples per task and higher score (+5.4 on average) with 200 examples. Finally, our statistical analysis on two case categories--drunk driving and fraud--with 35k precedents reveals the resulting structured information from our IE system faithfully reflects the macroscopic features of Korean legal system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题