论文标题
通过产生硬情境来为多步质量保证教授广泛的推理技能
Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts
论文作者
论文摘要
提问数据集需要广泛的推理技能。我们展示了如何使用问题分解来以强大的方式教授这些广泛的推理技能。具体来说,我们使用广泛可用的QDMR表示来编程中为六个多步推理数据集中的真实问题创建难以实现的综合上下文。这些环境经过精心设计,以避免在实际情况下盛行的捷径,以防止模型学习正确的技能。这导致了一个预处理的数据集,名为TeeBreac,其中包含525k多步问题(相关的正式程序),涵盖了约900个推理模式。我们表明,在Target数据集中细微调它们之前,在TeeBreac上进行了预读的标准语言模型(LMS),可在4个多步质量质量保留数据集中提高其性能高达13 f1点,在更复杂的问题上,最多可以提高21分。最终的模型还表现出更高的鲁棒性,在两个对比组上有5-8的F1点提高。此外,即使从使用最新方法(例如Preasm,Poet)仔细考虑的Numerate LMS开始,预处理典型的预测也可以显着提高模型性能和鲁棒性。因此,我们的工作表明了如何有效地使用分解引导的上下文来鲁棒地教授多步推理。
Question-answering datasets require a broad set of reasoning skills. We show how to use question decompositions to teach language models these broad reasoning skills in a robust fashion. Specifically, we use widely available QDMR representations to programmatically create hard-to-cheat synthetic contexts for real questions in six multi-step reasoning datasets. These contexts are carefully designed to avoid reasoning shortcuts prevalent in real contexts that prevent models from learning the right skills. This results in a pretraining dataset, named TeaBReaC, containing 525K multi-step questions (with associated formal programs) covering about 900 reasoning patterns. We show that pretraining standard language models (LMs) on TeaBReaC before fine-tuning them on target datasets improves their performance by up to 13 F1 points across 4 multi-step QA datasets, with up to 21 point gain on more complex questions. The resulting models also demonstrate higher robustness, with a 5-8 F1 point improvement on two contrast sets. Furthermore, TeaBReaC pretraining substantially improves model performance and robustness even when starting with numerate LMs pretrained using recent methods (e.g., PReasM, POET). Our work thus shows how to effectively use decomposition-guided contexts to robustly teach multi-step reasoning.