论文标题

使用机器翻译一天内本地化开放主体学质量检查语义解析器

Localizing Open-Ontology QA Semantic Parsers in a Day Using Machine Translation

论文作者

Moradshahi, Mehrad, Campagna, Giovanni, Semnani, Sina J., Xu, Silei, Lam, Monica S.

论文摘要

我们提出了语义解析器本地化学(SPL),该工具包利用神经机器翻译(NMT)系统来定位一种新语言的语义解析器。我们的方法是(1)通过从公共网站上刮除的本地实体来增强机器翻译数据集自动生成培训数据,(2)添加一些弹药的增强,并培训人类翻译的句子并培训新颖的XLMR-LSTM语义解析器,以及(3)对使用人类翻译器进行自然式策划的模型测试模型。 我们通过扩展Schema2QA的当前功能,该系统在开放网络上的英语问答系统(QA)来评估我们的方法的有效性,并扩展到餐馆和酒店域名的10种新语言。我们的模型实现了酒店域名的总体测试精度在61%至69%之间,餐厅域的范围在64%和78%之间,对经过黄金英语数据培训的英语Parser获得了69%和80%的比较。我们显示,我们的方法的表现优于先前的最先进方法,用于酒店的30%以上,而对于经过测试的语言的局部本体学的餐馆来说,您的方法的表现超过30%。 我们的方法使任何软件开发人员都能在不到24小时内为新域,利用机器翻译添加新的语言功能。

We propose Semantic Parser Localizer (SPL), a toolkit that leverages Neural Machine Translation (NMT) systems to localize a semantic parser for a new language. Our methodology is to (1) generate training data automatically in the target language by augmenting machine-translated datasets with local entities scraped from public websites, (2) add a few-shot boost of human-translated sentences and train a novel XLMR-LSTM semantic parser, and (3) test the model on natural utterances curated using human translators. We assess the effectiveness of our approach by extending the current capabilities of Schema2QA, a system for English Question Answering (QA) on the open web, to 10 new languages for the restaurants and hotels domains. Our models achieve an overall test accuracy ranging between 61% and 69% for the hotels domain and between 64% and 78% for restaurants domain, which compares favorably to 69% and 80% obtained for English parser trained on gold English data and a few examples from validation set. We show our approach outperforms the previous state-of-the-art methodology by more than 30% for hotels and 40% for restaurants with localized ontologies for the subset of languages tested. Our methodology enables any software developer to add a new language capability to a QA system for a new domain, leveraging machine translation, in less than 24 hours.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源