论文标题
对生物学启发的视觉纹理模型的自我监督学习
Self-Supervised Learning of a Biologically-Inspired Visual Texture Model
论文作者
论文摘要
我们开发了一个模型,用于在低维特征空间中代表视觉纹理,以及一个新颖的自学学习目标,用于在未标记的纹理图像数据库上训练它。受灵长类动物视觉皮层的结构的启发,该模型使用了定向线性过滤器的第一阶段(对应于皮质区域V1),由两个整流单元(简单单元)和汇总的相变单元(复杂单元)组成。这些响应是通过第二阶段(类似于皮质面积V2的)处理的,该响应由卷积过滤器组成,然后进行半波整流和合并以产生V2“复杂细胞”响应。第二阶段过滤器使用一个新颖的对比目标对一组未标记的均匀纹理图像进行了训练,该目标最大化了V2对单个图像的响应分布与所有图像中响应的分布之间的距离。当对纹理分类进行评估时,受过训练的模型比各种深层层次模型体系结构的数据效率要大得多。此外,我们表明,学到的模型比在灵长类动物V2中记录的神经种群的质地反应比预先训练的深CNN表现出更强的代表性相似性。
We develop a model for representing visual texture in a low-dimensional feature space, along with a novel self-supervised learning objective that is used to train it on an unlabeled database of texture images. Inspired by the architecture of primate visual cortex, the model uses a first stage of oriented linear filters (corresponding to cortical area V1), consisting of both rectified units (simple cells) and pooled phase-invariant units (complex cells). These responses are processed by a second stage (analogous to cortical area V2) consisting of convolutional filters followed by half-wave rectification and pooling to generate V2 'complex cell' responses. The second stage filters are trained on a set of unlabeled homogeneous texture images, using a novel contrastive objective that maximizes the distance between the distribution of V2 responses to individual images and the distribution of responses across all images. When evaluated on texture classification, the trained model achieves substantially greater data-efficiency than a variety of deep hierarchical model architectures. Moreover, we show that the learned model exhibits stronger representational similarity to texture responses of neural populations recorded in primate V2 than pre-trained deep CNNs.