论文标题
多模式的记忆力:语义和衰减对视频记忆性的建模效果
Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability
论文作者
论文摘要
智能系统的关键能力是决定何时必须记住过去经验的事件以及何时可以忘记它们。为了实现这一目标,我们开发了人类视觉事件记忆的预测模型以及这些记忆如何随着时间的流逝而衰减。我们介绍了Memento10k,这是一种新的,动态的视频记忆数据集,其中包含不同观看延迟的人类注释。基于我们的发现,我们提出了一种新的数学表述对记忆性衰减的新数学表述,从而产生了一个模型,该模型能够对视频如何随着时间的推移在记忆中的衰减方式产生第一个定量估计。与以前的工作相反,我们的模型可以预测视频以任意延迟记忆的概率。重要的是,我们的方法结合了视觉和语义信息(以文本字幕的形式),以完全表示事件的含义。我们对包括Memento10k在内的两个视频记忆力基准的实验表明,我们的模型在最佳先前方法(平均为12%)上大大提高。
A key capability of an intelligent system is deciding when events from past experience must be remembered and when they can be forgotten. Towards this goal, we develop a predictive model of human visual event memory and how those memories decay over time. We introduce Memento10k, a new, dynamic video memorability dataset containing human annotations at different viewing delays. Based on our findings we propose a new mathematical formulation of memorability decay, resulting in a model that is able to produce the first quantitative estimation of how a video decays in memory over time. In contrast with previous work, our model can predict the probability that a video will be remembered at an arbitrary delay. Importantly, our approach combines visual and semantic information (in the form of textual captions) to fully represent the meaning of events. Our experiments on two video memorability benchmarks, including Memento10k, show that our model significantly improves upon the best prior approach (by 12% on average).