论文标题
跨任务和语言的零拍学习的参数空间分解
Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages
论文作者
论文摘要
NLP任务和语言品种的大多数组合都缺乏在注释数据的情况下进行监督培训的示例。神经模型如何从任务语言组合以及可用数据的可用数据中对样本有效的概括?在这项工作中,我们为神经参数空间提出了贝叶斯生成模型。我们假设该空间可以分解为每种语言和每个任务的潜在变量。我们根据通过变异推理的数据来推断此类潜在变量对此类潜在变量的推断。这使得在预测时可以对看不见的组合进行零射击分类。例如,给定对越南语中指定实体识别(NER)和沃洛夫(Wolof)中词性(POS)标记的训练数据,我们的模型可以对Wolof中的NER进行准确的预测。特别是,我们试验了来自4个大洲和11个家庭的33种语言的类型多样性样本,并表明我们的模型比最先进的零摄影跨语言转移方法产生了可比或更好的结果。此外,我们证明了近似贝叶斯模型的平均导致预测分布更平滑,其熵与精度成反比。因此,提议的框架还提供了预测不确定性的强大估计。我们的代码位于github.com/cambridgeltl/parameter-factorization
Most combinations of NLP tasks and language varieties lack in-domain examples for supervised training because of the paucity of annotated data. How can neural models make sample-efficient generalizations from task-language combinations with available data to low-resource ones? In this work, we propose a Bayesian generative model for the space of neural parameters. We assume that this space can be factorized into latent variables for each language and each task. We infer the posteriors over such latent variables based on data from seen task-language combinations through variational inference. This enables zero-shot classification on unseen combinations at prediction time. For instance, given training data for named entity recognition (NER) in Vietnamese and for part-of-speech (POS) tagging in Wolof, our model can perform accurate predictions for NER in Wolof. In particular, we experiment with a typologically diverse sample of 33 languages from 4 continents and 11 families, and show that our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods. Moreover, we demonstrate that approximate Bayesian model averaging results in smoother predictive distributions, whose entropy inversely correlates with accuracy. Hence, the proposed framework also offers robust estimates of prediction uncertainty. Our code is located at github.com/cambridgeltl/parameter-factorization