论文标题
将神经中文单词分割作为低资源的机器翻译任务
Approaching Neural Chinese Word Segmentation as a Low-Resource Machine Translation Task
论文作者
论文摘要
中文单词细分进入了深度学习时代,这大大减少了功能工程的麻烦。最近,一些研究人员试图将其视为角色级翻译,这进一步简化了模型设计,但是基于翻译的方法和其他方法之间存在性能差距。这激发了我们的工作,其中我们将低资源神经机器翻译的最佳实践应用于监督中国细分。我们研究了一系列技术,包括正则化,数据扩展,客观加权,转移学习和结合。与以前的作品相比,我们的低资源基于翻译的方法保持了毫无轻松的模型设计,但在不使用其他数据的情况下,在受约束的评估中取得了与最新技术相同的结果。
Chinese word segmentation has entered the deep learning era which greatly reduces the hassle of feature engineering. Recently, some researchers attempted to treat it as character-level translation, which further simplified model designing, but there is a performance gap between the translation-based approach and other methods. This motivates our work, in which we apply the best practices from low-resource neural machine translation to supervised Chinese segmentation. We examine a series of techniques including regularization, data augmentation, objective weighting, transfer learning, and ensembling. Compared to previous works, our low-resource translation-based method maintains the effortless model design, yet achieves the same result as state of the art in the constrained evaluation without using additional data.