论文标题
开放问题回答表和文字
Open Question Answering over Tables and Text
论文作者
论文摘要
在公开的问题回答(QA)中,问题的答案是通过检索可能包含问题答案的文档而产生的。大多数开放的QA系统仅考虑从非结构化文本中检索信息。在这里,我们首次考虑在表格和文本数据上打开质量检查,并提出一个新的大型数据集打开表和文本答案(OTT-QA),以评估此任务的性能。 OTT-QA中的大多数问题都需要跨表格数据和非结构化文本进行多跳推断,并且回答问题所需的证据可以在这两种输入的方式上以不同的方式分布,从而使证据检索具有挑战性 - 我们使用迭代回收仪和基于Bert的Reader的基线模型可以达到比例的确切匹配分数低于10%。然后,我们提出了两种新型技术,以应对检索和汇总OTT-QA证据的挑战。第一种技术是使用“早期融合”将多个高度相关的表格和文本单元分组为融合块,这为猎犬提供了更多的上下文。第二种技术是使用跨块读取器对全球局部稀疏注意的多个检索证据之间的交叉依赖性进行建模。结合这两种技术可显着提高得分,高于27%。
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. Here we consider for the first time open QA over both tabular and textual data and present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task. Most questions in OTT-QA require multi-hop inference across tabular data and unstructured text, and the evidence required to answer a question can be distributed in different ways over these two types of input, making evidence retrieval challenging -- our baseline model using an iterative retriever and BERT-based reader achieves an exact match score less than 10%. We then propose two novel techniques to address the challenge of retrieving and aggregating evidence for OTT-QA. The first technique is to use "early fusion" to group multiple highly relevant tabular and textual units into a fused block, which provides more context for the retriever to search for. The second technique is to use a cross-block reader to model the cross-dependency between multiple retrieved evidence with global-local sparse attention. Combining these two techniques improves the score significantly, to above 27%.