CLEVR-MATH：用于组成语言，视觉和数学推理的数据集

论文标题

CLEVR-MATH：用于组成语言，视觉和数学推理的数据集

CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning

论文作者

Lindström, Adam Dahlgren, Abraham, Savitha Sam

论文摘要

我们介绍了CLEVR-MATH，这是一个多模式数学单词问题数据集，该数据集由简单的数学单词问题组成，涉及加法/减法，部分代表了文本描述，部分是由图像说明了场景。文本描述了图像中描述的场景上执行的动作。由于提出的问题可能与图像中的场景有关，而是针对采用动作之前或之后的场景状态，因此求解器设想或想象由于这些动作而导致的状态发生了变化。解决这些单词问题需要语言，视觉和数学推理的结合。我们将最新的神经和神经符号模型应用于CLEVR-MATH的视觉问题，并经验评估其表现。我们的结果表明，两种方法如何推广到操作链。我们讨论了两者在解决多模式单词问题解决的任务时的局限性。

We introduce CLEVR-Math, a multi-modal math word problems dataset consisting of simple math word problems involving addition/subtraction, represented partly by a textual description and partly by an image illustrating the scenario. The text describes actions performed on the scene that is depicted in the image. Since the question posed may not be about the scene in the image, but about the state of the scene before or after the actions are applied, the solver envision or imagine the state changes due to these actions. Solving these word problems requires a combination of language, visual and mathematical reasoning. We apply state-of-the-art neural and neuro-symbolic models for visual question answering on CLEVR-Math and empirically evaluate their performances. Our results show how neither method generalise to chains of operations. We discuss the limitations of the two in addressing the task of multi-modal word problem solving.

下载PDF全文

下载文献需遵守相关版权规定

论文标题