论文标题
了解卷积过滤器的协方差结构
Understanding the Covariance Structure of Convolutional Filters
论文作者
论文摘要
神经网络权重通常是由单变量分布随机初始初始初始初始初始初始初始初始的,即使在高度结构化的操作中,也只能控制单个权重的方差。 Convmixer和Convnext等最新的VIT启发性卷积网络使用大型深度卷积,其学识渊博的过滤器具有明显的结构。这为研究其经验协方差提供了机会。在这项工作中,我们首先观察到,这种学识渊博的过滤器具有高度结构化的协方差矩阵,此外,我们发现从小型网络中计算出的协方差可用于有效初始化各种不同深度,宽度,斑块大小和内核大小的较大的较大网络,表明对协方差的模型独立性程度。在这些发现的激励下,我们提出了一种使用简单的,封闭形式的协方差构造的卷积过滤器的无学习的多元初始化方案。使用我们的初始化的模型优于使用传统单变量初始化的模型,通常会符合或超过从学习过滤器的协方差的初始化的模型。在某些情况下,可以在不训练深度卷积过滤器的情况下实现此改进。
Neural network weights are typically initialized at random from univariate distributions, controlling just the variance of individual weights even in highly-structured operations like convolutions. Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions whose learned filters have notable structure; this presents an opportunity to study their empirical covariances. In this work, we first observe that such learned filters have highly-structured covariance matrices, and moreover, we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks of different depths, widths, patch sizes, and kernel sizes, indicating a degree of model-independence to the covariance structure. Motivated by these findings, we then propose a learning-free multivariate initialization scheme for convolutional filters using a simple, closed-form construction of their covariance. Models using our initialization outperform those using traditional univariate initializations, and typically meet or exceed the performance of those initialized from the covariances of learned filters; in some cases, this improvement can be achieved without training the depthwise convolutional filters at all.