论文标题

用于文本到SQL解析器的跨数据库适应的多种并行数据综合

Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers

论文作者

Awasthi, Abhijeet, Sathe, Ashutosh, Sarawagi, Sunita

论文摘要

文本到SQL解析器通常会在火车时间内看不见的数据库挣扎。由于新模式中缺乏自然语言查询,因此将解析器适应新数据库是一个具有挑战性的问题。我们提出了补充,这是合成高质量和文本多样性数据集的框架,用于将文本到SQL解析器调整为目标架构。 Refill学会从现有的模式中检索和编辑文本查询,并将其转移到目标模式。我们表明,检索各种现有文本,掩盖其模式特异性令牌,并与与目标模式相关的代币进行重新填充,这与标准的SQL-to-Text生成方法相比,具有明显的多样化文本查询。通过跨越多个数据库的实验,我们证明了使用重新填料合成的数据集上的微调解析器始终优于先前的数据夸大方法。

Text-to-SQL parsers typically struggle with databases unseen during the train time. Adapting parsers to new databases is a challenging problem due to the lack of natural language queries in the new schemas. We present ReFill, a framework for synthesizing high-quality and textually diverse parallel datasets for adapting a Text-to-SQL parser to a target schema. ReFill learns to retrieve-and-edit text queries from the existing schemas and transfers them to the target schema. We show that retrieving diverse existing text, masking their schema-specific tokens, and refilling with tokens relevant to the target schema, leads to significantly more diverse text queries than achievable by standard SQL-to-Text generation methods. Through experiments spanning multiple databases, we demonstrate that fine-tuning parsers on datasets synthesized using ReFill consistently outperforms the prior data-augmentation methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源