象征性的图灵机

论文标题

象征性的图灵机

Token Turing Machines

论文作者

Ryoo, Michael S., Gopalakrishnan, Keerthana, Kahatapitiya, Kumara, Xiao, Ted, Rao, Kanishka, Stone, Austin, Lu, Yao, Ibarz, Julian, Arnab, Anurag

论文摘要

我们提出了令牌图灵机（TTM），这是一种带有内存的顺序，自回归变压器模型，可实现真实世界的顺序视觉理解。我们的模型灵感来自开创性神经图灵机，并具有外部内存，该内存由一组代币组成，这些令牌总结了先前的历史记录（即帧）。在每个步骤中，使用变压器作为处理单元/控制器有效地解决，读取和编写此内存。该模型的内存模块可确保仅使用内存的内容（而不是整个历史记录）处理新的观察值，这意味着它可以在每个步骤处使用有限的计算成本来有效地处理长序列。我们表明，在两个现实世界的顺序视觉理解任务上，TTM优于其他替代方案，例如为长序列和经常性神经网络设计的其他变压器模型：从视频和基于视频的机器人动作策略学习中的在线时间活动检测。代码可公开可用：https：//github.com/google-research/scenic/tree/main/main/scenic/projects/token_turing

We propose Token Turing Machines (TTM), a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding. Our model is inspired by the seminal Neural Turing Machine, and has an external memory consisting of a set of tokens which summarise the previous history (i.e., frames). This memory is efficiently addressed, read and written using a Transformer as the processing unit/controller at each step. The model's memory module ensures that a new observation will only be processed with the contents of the memory (and not the entire history), meaning that it can efficiently process long sequences with a bounded computational cost at each step. We show that TTM outperforms other alternatives, such as other Transformer models designed for long sequences and recurrent neural networks, on two real-world sequential visual understanding tasks: online temporal activity detection from videos and vision-based robot action policy learning. Code is publicly available at: https://github.com/google-research/scenic/tree/main/scenic/projects/token_turing

下载PDF全文

下载文献需遵守相关版权规定

论文标题