论文标题
TRESPOS:用于合并不同POS标签集数据集的变压器
TransPOS: Transformers for Consolidating Different POS Tagset Datasets
论文作者
论文摘要
希望扩大培训数据,研究人员通常希望合并使用使用不同标签方案创建的两个或多个数据集。本文考虑了两个数据集,这些数据集标记了不同标签方案下的词性词版(POS)标签,并利用一个数据集的监督标签来帮助生成其他数据集的标签。本文进一步讨论了这种方法的理论困难,并提出了一种新型的监督架构,该建筑采用变压器来解决两个完全脱节数据集的问题。结果与最初的期望和探索探索不同于使用不相交标签将数据集与不同标签合并的使用。
In hope of expanding training data, researchers often want to merge two or more datasets that are created using different labeling schemes. This paper considers two datasets that label part-of-speech (POS) tags under different tagging schemes and leverage the supervised labels of one dataset to help generate labels for the other dataset. This paper further discusses the theoretical difficulties of this approach and proposes a novel supervised architecture employing Transformers to tackle the problem of consolidating two completely disjoint datasets. The results diverge from initial expectations and discourage exploration into the use of disjoint labels to consolidate datasets with different labels.