论文标题

双重温度有助于对比度学习,而没有许多负样本:了解和简化Moco

Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo

论文作者

Zhang, Chaoning, Zhang, Kang, Pham, Trung X., Niu, Axi, Qiao, Zhinan, Yoo, Chang D., Kweon, In So

论文摘要

众所周知,对比度学习(CL)需要许多负样本,例如MoCO中的65536,例如,无词字典的框架的性能通常不如较低,因为负样本量(NSS)受其迷你批量尺寸(MB)的限制。为了使NSS与MBS脱离,在大量CL框架中采用了动态词典,其中最受欢迎的是Moco家族。从本质上讲,Moco采用了基于动量的队列词典,为此我们对其大小和一致性进行了细粒度的分析。我们指出,MOCO中使用的Infonce损失隐式地吸引了其相应的积极样本,并具有各种惩罚的强度,并确定这种锚固性硬度 - 意识属性是大型词典必要性的主要原因。我们的发现激发了我们通过删除其词典和动量来简化Moco V2。基于提出的双重温度的Infonce,我们的简化框架Simmoco和Simc​​o以可见的边缘优于Moco V2。此外,我们的工作弥合了CL和非CL框架之间的差距,这有助于对SSL中这两个主流框架的更统一的理解。代码可在以下网址提供:https://bit.ly/3lkqbat。

Contrastive learning (CL) is widely known to require many negative samples, 65536 in MoCo for instance, for which the performance of a dictionary-free framework is often inferior because the negative sample size (NSS) is limited by its mini-batch size (MBS). To decouple the NSS from the MBS, a dynamic dictionary has been adopted in a large volume of CL frameworks, among which arguably the most popular one is MoCo family. In essence, MoCo adopts a momentum-based queue dictionary, for which we perform a fine-grained analysis of its size and consistency. We point out that InfoNCE loss used in MoCo implicitly attract anchors to their corresponding positive sample with various strength of penalties and identify such inter-anchor hardness-awareness property as a major reason for the necessity of a large dictionary. Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum. Based on an InfoNCE with the proposed dual temperature, our simplified frameworks, SimMoCo and SimCo, outperform MoCo v2 by a visible margin. Moreover, our work bridges the gap between CL and non-CL frameworks, contributing to a more unified understanding of these two mainstream frameworks in SSL. Code is available at: https://bit.ly/3LkQbaT.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源