论文标题

法律主题分类中现实的零击跨语性转移

Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification

论文作者

Xenouleas, Stratos, Tsoukara, Alexia, Panagiotakis, Giannis, Chalkidis, Ilias, Androutsopoulos, Ion

论文摘要

我们考虑使用最新的MultieUrlex数据集中考虑法律主题分类中的零射击跨语性转移。由于原始数据集包含并行文档,这对于零拍传输不现实是不现实的,因此我们开发了一个没有并行文档的数据集的新版本。我们使用它来表明,基于翻译的方法极高地超过了多绘制预训练的模型的跨语性微调,这是多曲线的最佳先前的零拍传输方法。我们还开发了一种双语的教师零击转移方法,该方法利用了目标语言的其他未标记文档,并且比直接在标记的目标语言文档上微调的模型更好地执行。

We consider zero-shot cross-lingual transfer in legal topic classification using the recent MultiEURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for MultiEURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源