大型序列标记器的结合和知识蒸馏进行语法误差校正

论文标题

大型序列标记器的结合和知识蒸馏进行语法误差校正

Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

论文作者

Tarnavskyi, Maksym, Chernodub, Artem, Omelianchuk, Kostiantyn

论文摘要

在本文中，我们研究了GEC序列标记体系结构的改进，重点是在大型配置中结合最近基于尖端变压器的编码器。我们鼓励以多数票的方式进行跨级编辑的结合模型，因为这种方法可以耐受模型体系结构和词汇量。我们最好的合奏在BEA-2019（测试）上以$ f_ {0.5} $得分为76.05，从而实现了新的SOTA结果，即使没有预先培训合成数据集。此外，我们通过训练有素的合奏进行知识蒸馏，以生成新的合成训练数据集，“ Troy-Blogs”和“ Troy-1BW”。 Our best single sequence tagging model that is pretrained on the generated Troy-datasets in combination with the publicly available synthetic PIE dataset achieves a near-SOTA (To the best of our knowledge, our best single model gives way only to much heavier T5 model result with an $F_{0.5}$ score of 73.21 on BEA-2019 (test). The code, datasets, and trained models are publicly available).

In this paper, we investigate improvements to the GEC sequence tagging architecture with a focus on ensembling of recent cutting-edge Transformer-based encoders in Large configurations. We encourage ensembling models by majority votes on span-level edits because this approach is tolerant to the model architecture and vocabulary size. Our best ensemble achieves a new SOTA result with an $F_{0.5}$ score of 76.05 on BEA-2019 (test), even without pre-training on synthetic datasets. In addition, we perform knowledge distillation with a trained ensemble to generate new synthetic training datasets, "Troy-Blogs" and "Troy-1BW". Our best single sequence tagging model that is pretrained on the generated Troy-datasets in combination with the publicly available synthetic PIE dataset achieves a near-SOTA (To the best of our knowledge, our best single model gives way only to much heavier T5 model result with an $F_{0.5}$ score of 73.21 on BEA-2019 (test). The code, datasets, and trained models are publicly available).

下载PDF全文

下载文献需遵守相关版权规定

论文标题