以心理物理为导向的显着图图预测模型

论文标题

以心理物理为导向的显着图图预测模型

A Psychophysically Oriented Saliency Map Prediction Model

论文作者

Li, Qiang

论文摘要

视觉关注是选择和理解外部冗余世界的最重要特征之一。由于视觉信息瓶颈，人类视觉系统无法同时处理所有信息。为了减少视觉信息的冗余输入，人类视觉系统主要集中于场景的主要部分。这通常称为视觉显着图图预测。本文提出了一种新的心理物理显着性预测体系结构WECSF，灵感来自人类视觉皮层功能的多渠道模型。该模型由对手颜色通道，小波变换，小波能量图和对比度灵敏度函数组成，用于提取低级图像特征并为人类视觉系统提供最大近似值。使用多个数据集评估了所提出的模型，包括MIT1003，MIT300，多伦多，SID4VAM和UCF Sports数据集。我们还定量和定性地将显着性预测性能与其他最先进模型的性能进行了比较。我们的模型在自然图像，心理物理合成图像和动态视频上具有不同的指标，实现了强劲稳定，更好的性能。此外，我们发现，傅立叶和光谱启发的显着性预测模型优于其他最先进的非神经网络，甚至超过心理物理合成图像的深度神经网络模型。可以通过傅立叶视力假设来解释和支持。同时，我们建议深层神经网络需要特定的架构和目标，以便能够更好，更可靠地预测心理物理合成图像的显着性能。最后，提出的模型可以用作灵长类动物视觉系统的计算模型，并帮助我们了解灵长类动物视觉系统的机制。

Visual attention is one of the most significant characteristics for selecting and understanding the outside redundancy world. The human vision system cannot process all information simultaneously due to the visual information bottleneck. In order to reduce the redundant input of visual information, the human visual system mainly focuses on dominant parts of scenes. This is commonly known as visual saliency map prediction. This paper proposed a new psychophysical saliency prediction architecture, WECSF, inspired by multi-channel model of visual cortex functioning in humans. The model consists of opponent color channels, wavelet transform, wavelet energy map, and contrast sensitivity function for extracting low-level image features and providing a maximum approximation to the human visual system. The proposed model is evaluated using several datasets, including the MIT1003, MIT300, TORONTO, SID4VAM, and UCF Sports datasets. We also quantitatively and qualitatively compare the saliency prediction performance with that of other state-of-the-art models. Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos. Additionally, we found that Fourier and spectral-inspired saliency prediction models outperformed other state-of-the-art non-neural network and even deep neural network models on psychophysical synthetic images. It can be explained and supported by the Fourier Vision Hypothesis. In the meantime, we suggest that deep neural networks need specific architectures and goals to be able to predict salient performance on psychophysical synthetic images better and more reliably. Finally, the proposed model could be used as a computational model of primate vision system and help us understand mechanism of primate vision system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题