论文标题
GAMR:(视觉)推理的指导注意力模型
GAMR: A Guided Attention Model for (visual) Reasoning
论文作者
论文摘要
人类在灵活解析和理解复杂的视觉场景的能力方面继续超越现代AI系统。在这里,我们提出了一个用于视觉推理的新型模块,即(视觉)推理(GAMR)的引导注意模型,该模型实例化了主动视觉理论 - 表明大脑通过注意的序列转移到选择和路由任务相关的视觉信息,以动态地解决复杂的视觉推理问题。对一系列视觉推理任务和数据集进行的实验表明,Gamr以稳健而有效的方式学习视觉例程的能力。此外,GAMR被证明能够在完全新颖的推理任务上进行零拍的概括。总体而言,我们的工作为认知理论提供了计算支持,这些理论假定注意力和记忆之间需要进行关键相互作用,以动态维护和操纵与任务相关的视觉信息,以解决复杂的视觉推理任务。
Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. Here, we present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR), which instantiates an active vision theory -- positing that the brain solves complex visual reasoning problems dynamically -- via sequences of attention shifts to select and route task-relevant visual information into memory. Experiments on an array of visual reasoning tasks and datasets demonstrate GAMR's ability to learn visual routines in a robust and sample-efficient manner. In addition, GAMR is shown to be capable of zero-shot generalization on completely novel reasoning tasks. Overall, our work provides computational support for cognitive theories that postulate the need for a critical interplay between attention and memory to dynamically maintain and manipulate task-relevant visual information to solve complex visual reasoning tasks.