一种嵌入动力的方法，用于自我监督学习

论文标题

一种嵌入动力的方法，用于自我监督学习

An Embedding-Dynamic Approach to Self-supervised Learning

论文作者

Moon, Suhong, Buracas, Domas, Park, Seunghyun, Kim, Jinkyu, Canny, John

论文摘要

许多最近的自我监督学习方法在图像分类和其他任务上表现出了令人印象深刻的表现。已经使用了一种令人困惑的多种技术，并不总是清楚地了解其收益的原因，尤其是在组合使用时。在这里，我们将图像的嵌入视为点粒子，并将模型优化视为该颗粒系统上的动态过程。我们的动态模型结合了相似图像的吸引力，避免局部崩溃的局部分散力以及实现颗粒的全球均匀分布的全局分散力。动态透视图突出了使用延迟参数图像嵌入（a la byol）以及同一图像的多个视图的优点。它还使用纯粹动态的局部分散力（布朗运动），该分散力比其他方法显示出改善的性能，并且不需要其他粒子坐标的知识。该方法称为MSBREG，该方法代表（i）多视中心损失，该损失施加了吸引力的力量来将不同的图像视图嵌入到其质心上，（ii）奇异的值损失，将粒子系统推向空间均匀的密度，（iii）棕色扩散损失。我们评估MSBREG在ImageNet上的下游分类性能以及转移学习任务，包括细粒度分类，多类对象分类，对象检测和实例分段。此外，我们还表明，将我们的正规化术语应用于其他方法，进一步提高了其性能并通过防止模式崩溃来稳定训练。

A number of recent self-supervised learning methods have shown impressive performance on image classification and other tasks. A somewhat bewildering variety of techniques have been used, not always with a clear understanding of the reasons for their benefits, especially when used in combination. Here we treat the embeddings of images as point particles and consider model optimization as a dynamic process on this system of particles. Our dynamic model combines an attractive force for similar images, a locally dispersive force to avoid local collapse, and a global dispersive force to achieve a globally-homogeneous distribution of particles. The dynamic perspective highlights the advantage of using a delayed-parameter image embedding (a la BYOL) together with multiple views of the same image. It also uses a purely-dynamic local dispersive force (Brownian motion) that shows improved performance over other methods and does not require knowledge of other particle coordinates. The method is called MSBReg which stands for (i) a Multiview centroid loss, which applies an attractive force to pull different image view embeddings toward their centroid, (ii) a Singular value loss, which pushes the particle system toward spatially homogeneous density, (iii) a Brownian diffusive loss. We evaluate downstream classification performance of MSBReg on ImageNet as well as transfer learning tasks including fine-grained classification, multi-class object classification, object detection, and instance segmentation. In addition, we also show that applying our regularization term to other methods further improves their performance and stabilize the training by preventing a mode collapse.

下载PDF全文

下载文献需遵守相关版权规定

论文标题