印度法律文本摘要：基于文本正常化的方法

论文标题

印度法律文本摘要：基于文本正常化的方法

Indian Legal Text Summarization: A Text Normalisation-based Approach

论文作者

Ghosh, Satyajit, Dutta, Mousumi, Das, Tanaya

论文摘要

在印度法院制度中，长期以来一直是一个问题。有超过4千万的案件。对于法律利益相关者来说，手动总结数百个文件是一项耗时且繁琐的任务。随着机器学习的发展，许多用于文本摘要的最新模型已经出现。独立于域的模型对法律文本的表现不佳，并且由于缺乏公开可用的数据集，对印度法律制度的这些模型进行微调是有问题的。为了提高独立模型的性能，作者提出了一种在印度背景下使法律文本正常化的方法。作者尝试了两个与法律文本摘要的最新域无关模型，即Bart和Pegasus。 Bart和Pegasus以提取性和抽象的摘要为例，以了解文本归一化方法的有效性。总结文本由域专家评估多个参数和使用胭脂指标。它表明，在具有域独立模型的法律文本中，提出的文本归一化方法有效。

In the Indian court system, pending cases have long been a problem. There are more than 4 crore cases outstanding. Manually summarising hundreds of documents is a time-consuming and tedious task for legal stakeholders. Many state-of-the-art models for text summarization have emerged as machine learning has progressed. Domain-independent models don't do well with legal texts, and fine-tuning those models for the Indian Legal System is problematic due to a lack of publicly available datasets. To improve the performance of domain-independent models, the authors have proposed a methodology for normalising legal texts in the Indian context. The authors experimented with two state-of-the-art domain-independent models for legal text summarization, namely BART and PEGASUS. BART and PEGASUS are put through their paces in terms of extractive and abstractive summarization to understand the effectiveness of the text normalisation approach. Summarised texts are evaluated by domain experts on multiple parameters and using ROUGE metrics. It shows the proposed text normalisation approach is effective in legal texts with domain-independent models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题