一种新的评估方法：中文语法误差校正的评估数据和指标

论文标题

一种新的评估方法：中文语法误差校正的评估数据和指标

A New Evaluation Method: Evaluation Data and Metrics for Chinese Grammar Error Correction

论文作者

Lin, Nankai, Lin, Nankai, Lin, Xiaotian, Yang, Ziyu, Jiang, Shengyi

论文摘要

作为自然语言处理中的一项基本任务，中国语法错误校正（CGEC）逐渐受到广泛关注并成为研究热点。但是，现有的CGEC评估系统的一个明显缺陷是，评估值受到中国单词分割结果或不同语言模型的显着影响。在不同的单词分割系统或不同语言模型下，相同误差校正模型的评估值可能会有很大差异。但是，预计这些指标应独立于单词分割结果和语言模型，因为它们可能导致对不同方法的评估缺乏唯一性和可比性。为此，我们提出了CGEC的三个新型评估指标，分别为两个维度：基于参考和无参考。在基于参考的指标方面，我们介绍句子级的准确性和char-level bleu来评估校正后的句子。此外，就无参考指标而言，我们采用char级含义保存来衡量校正句子的语义保存程度。我们深入评估和分析了这三个指标的合理性和有效性，我们希望它们成为CGEC的新标准。

As a fundamental task in natural language processing, Chinese Grammatical Error Correction (CGEC) has gradually received widespread attention and become a research hotspot. However, one obvious deficiency for the existing CGEC evaluation system is that the evaluation values are significantly influenced by the Chinese word segmentation results or different language models. The evaluation values of the same error correction model can vary considerably under different word segmentation systems or different language models. However, it is expected that these metrics should be independent of the word segmentation results and language models, as they may lead to a lack of uniqueness and comparability in the evaluation of different methods. To this end, we propose three novel evaluation metrics for CGEC in two dimensions: reference-based and reference-less. In terms of the reference-based metric, we introduce sentence-level accuracy and char-level BLEU to evaluate the corrected sentences. Besides, in terms of the reference-less metric, we adopt char-level meaning preservation to measure the semantic preservation degree of the corrected sentences. We deeply evaluate and analyze the reasonableness and validity of the three proposed metrics, and we expect them to become a new standard for CGEC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题