非对比度SSL的隐式差异正则化

论文标题

非对比度SSL的隐式差异正则化

Implicit variance regularization in non-contrastive SSL

论文作者

Halvagal, Manu Srinath, Laborieux, Axel, Zenke, Friedemann

论文摘要

BYOL和SIMSIAM等非对比度SSL方法依赖于不对称的预测网络，以避免代表性崩溃而没有负样本。但是，预测网络如何促进稳定的学习尚不完全理解。尽管以前的理论分析假设了欧几里得的损失，但大多数实际实现都取决于余弦的相似性。为了进一步了解非对比度SSL的理论见解，我们在封闭形式线性预测器网络的特征性特征空间中分析学习学习动力学和余弦相似性。我们表明，尽管通过不同的动力学机制，但都避免了通过隐式方差正则化的崩溃。此外，我们发现特征值充当有效的学习率乘数，并提出了一个均等的各向素质损失函数（ISLOSS），该函数均衡了跨本本征的收敛速率。从经验上讲，孤立者加快了初始学习动力并提高了鲁棒性，从而使我们能够分配使用通常与非对抗性方法一起使用的EMA目标网络。我们的分析阐明了非对比度SSL的方差正规化机制，并为制定新型损失函数的理论基础奠定了塑造预测指标频谱的学习动力学的新型损失函数。

Non-contrastive SSL methods like BYOL and SimSiam rely on asymmetric predictor networks to avoid representational collapse without negative samples. Yet, how predictor networks facilitate stable learning is not fully understood. While previous theoretical analyses assumed Euclidean losses, most practical implementations rely on cosine similarity. To gain further theoretical insight into non-contrastive SSL, we analytically study learning dynamics in conjunction with Euclidean and cosine similarity in the eigenspace of closed-form linear predictor networks. We show that both avoid collapse through implicit variance regularization albeit through different dynamical mechanisms. Moreover, we find that the eigenvalues act as effective learning rate multipliers and propose a family of isotropic loss functions (IsoLoss) that equalize convergence rates across eigenmodes. Empirically, IsoLoss speeds up the initial learning dynamics and increases robustness, thereby allowing us to dispense with the EMA target network typically used with non-contrastive methods. Our analysis sheds light on the variance regularization mechanisms of non-contrastive SSL and lays the theoretical grounds for crafting novel loss functions that shape the learning dynamics of the predictor's spectrum.

下载PDF全文

下载文献需遵守相关版权规定

论文标题