GAMR：（视觉）推理的指导注意力模型

论文标题

GAMR：（视觉）推理的指导注意力模型

GAMR: A Guided Attention Model for (visual) Reasoning

论文作者

Vaishnav, Mohit, Serre, Thomas

论文摘要

人类在灵活解析和理解复杂的视觉场景的能力方面继续超越现代AI系统。在这里，我们提出了一个用于视觉推理的新型模块，即（视觉）推理（GAMR）的引导注意模型，该模型实例化了主动视觉理论 - 表明大脑通过注意的序列转移到选择和路由任务相关的视觉信息，以动态地解决复杂的视觉推理问题。对一系列视觉推理任务和数据集进行的实验表明，Gamr以稳健而有效的方式学习视觉例程的能力。此外，GAMR被证明能够在完全新颖的推理任务上进行零拍的概括。总体而言，我们的工作为认知理论提供了计算支持，这些理论假定注意力和记忆之间需要进行关键相互作用，以动态维护和操纵与任务相关的视觉信息，以解决复杂的视觉推理任务。

Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. Here, we present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR), which instantiates an active vision theory -- positing that the brain solves complex visual reasoning problems dynamically -- via sequences of attention shifts to select and route task-relevant visual information into memory. Experiments on an array of visual reasoning tasks and datasets demonstrate GAMR's ability to learn visual routines in a robust and sample-efficient manner. In addition, GAMR is shown to be capable of zero-shot generalization on completely novel reasoning tasks. Overall, our work provides computational support for cognitive theories that postulate the need for a critical interplay between attention and memory to dynamically maintain and manipulate task-relevant visual information to solve complex visual reasoning tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题