论文标题
句法问题抽象和数据筛选语义解析的检索
Syntactic Question Abstraction and Retrieval for Data-Scarce Semantic Parsing
论文作者
论文摘要
深度学习的语义解析方法需要大量标记的数据,但是注释复杂的逻辑形式是昂贵的。在这里,我们提出了句法问题抽象和检索(SQAR),这是一种构建神经语义解析器的方法,该方法将自然语言(NL)查询转换为SQL逻辑形式(LF),少于1,000个带注释的例子。 SQAR首先通过计算NL查询之间的相似性,然后在检索模式上接收词汇信息,从而从火车数据中检索逻辑模式,以生成最终的LF。我们使用Wikisql列车数据的各种小亚集的训练模型来验证SQAR,与WikisQL测试集上的先前最新模型相比,LF精度高达4.9%。我们还表明,通过使用查询相似度来检索逻辑模式,SQAR可以利用与仅使用WikisQL数据训练SQAR相比,LF准确度高达5.9%的LF准确性。与简单的模式分类方法相反,SQAR可以在不重新训练模型的情况下添加新示例后生成看不见的逻辑模式。我们还讨论了一种理想的方式,可以在数据分配中近似数据繁殖设置时创建具有成本效益和健壮的火车数据集。
Deep learning approaches to semantic parsing require a large amount of labeled data, but annotating complex logical forms is costly. Here, we propose Syntactic Question Abstraction and Retrieval (SQAR), a method to build a neural semantic parser that translates a natural language (NL) query to a SQL logical form (LF) with less than 1,000 annotated examples. SQAR first retrieves a logical pattern from the train data by computing the similarity between NL queries and then grounds a lexical information on the retrieved pattern in order to generate the final LF. We validate SQAR by training models using various small subsets of WikiSQL train data achieving up to 4.9% higher LF accuracy compared to the previous state-of-the-art models on WikiSQL test set. We also show that by using query-similarity to retrieve logical pattern, SQAR can leverage a paraphrasing dataset achieving up to 5.9% higher LF accuracy compared to the case where SQAR is trained by using only WikiSQL data. In contrast to a simple pattern classification approach, SQAR can generate unseen logical patterns upon the addition of new examples without re-training the model. We also discuss an ideal way to create cost efficient and robust train datasets when the data distribution can be approximated under a data-hungry setting.