论文标题
了解对比损失的行为
Understanding the Behaviour of Contrastive Loss
论文作者
论文摘要
无监督的对比度学习取得了杰出的成功,而对比度损失的机制却较少。在本文中,我们专注于对无监督对比损失行为的理解。我们将证明对比损失是一种硬度感知的损失函数,温度τ控制着硬性负样品的惩罚强度。先前的研究表明,统一性是对比度学习的关键特性。我们建立均匀性和温度τ之间的关系。我们将表明统一性有助于对比度学习学习可分离的特征,但是对统一性的过度追求使对比度损失不容忍对语义上相似的样本,这可能会破坏基本的语义结构,并对对下游任务有用的特征的形成有害。这是由实例歧视目标的固有缺陷引起的。具体而言,实例歧视目标试图将所有不同的实例分开,而忽略了样本之间的潜在关系。将语义一致的样本分开,没有积极的效果,可以获取对一般下游任务的信息。精心设计的对比损失应具有对语义相似样本的亲密性的一定程度。因此,我们发现对比度损失符合均匀性的难题,并且很好的温度选择可以妥善损害这两种特性,以学习可分离的特征和对语义相似的样本的耐受性,从而提高了特征质量和下游表演。
Unsupervised contrastive learning has achieved outstanding success, while the mechanism of contrastive loss has been less studied. In this paper, we concentrate on the understanding of the behaviours of unsupervised contrastive loss. We will show that the contrastive loss is a hardness-aware loss function, and the temperature τ controls the strength of penalties on hard negative samples. The previous study has shown that uniformity is a key property of contrastive learning. We build relations between the uniformity and the temperature τ . We will show that uniformity helps the contrastive learning to learn separable features, however excessive pursuit to the uniformity makes the contrastive loss not tolerant to semantically similar samples, which may break the underlying semantic structure and be harmful to the formation of features useful for downstream tasks. This is caused by the inherent defect of the instance discrimination objective. Specifically, instance discrimination objective tries to push all different instances apart, ignoring the underlying relations between samples. Pushing semantically consistent samples apart has no positive effect for acquiring a prior informative to general downstream tasks. A well-designed contrastive loss should have some extents of tolerance to the closeness of semantically similar samples. Therefore, we find that the contrastive loss meets a uniformity-tolerance dilemma, and a good choice of temperature can compromise these two properties properly to both learn separable features and tolerant to semantically similar samples, improving the feature qualities and the downstream performances.