利用合奏和自我监督的学习，以进行全面监督的人重新识别和文本作者归因

论文标题

利用合奏和自我监督的学习，以进行全面监督的人重新识别和文本作者归因

Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised Person Re-Identification and Text Authorship Attribution

论文作者

Bertocco, Gabriel, Theophilo, Antônio, Andaló, Fernanda, Rocha, Anderson

论文摘要

在多媒体取证问题（例如人的重新识别和文本作者身份归因）中，从完全未标记的数据中学习具有挑战性。在处理基础类别具有显着语义差异的情况下，最近的自我监督学习方法在处理完全未标记的数据时已显示出有效的，因为阶层内距离大大低于阶层间距离。但是，对于课堂具有相似语义的法医应用程序，培训和测试集具有不一紧密的身份。在这种情况下，一般的自我监督学习方法可能无法学习歧视性特征，因此需要更健壮的策略。我们提出了一种策略，即使来自不同班级的样本并非显着多样化，也可以通过从未标记的数据中学习来解决人员重新识别和文本作者归因。我们提出了一种新颖的基于合奏的聚类策略，从而将不同配置的簇组合起来，以完全不受欢迎的方式为数据样本生成更好的分组。该策略允许具有不同密度和更高可变性的簇出现，从而减少了类内部差异，而无需每个数据集找到最佳配置。我们还考虑了不同的卷积神经网络，用于提取特征和样本之间的随后距离计算。我们通过合并上下文并将其分组以捕获互补信息来完善这些距离。我们的方法在这两个任务中都具有强大的功能，具有不同的数据方式，并且在没有任何标签或人类干预的情况下，具有完全不受欢迎的解决方案的最先进方法。

Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution. Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences, as intra-class distances are substantially lower than inter-class distances. However, this is not the case for forensic applications in which classes have similar semantics and the training and test sets have disjoint identities. General self-supervised learning methods might fail to learn discriminative features in this scenario, thus requiring more robust strategies. We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse. We propose a novel ensemble-based clustering strategy whereby clusters derived from different configurations are combined to generate a better grouping for the data samples in a fully-unsupervised way. This strategy allows clusters with different densities and higher variability to emerge, reducing intra-class discrepancies without requiring the burden of finding an optimal configuration per dataset. We also consider different Convolutional Neural Networks for feature extraction and subsequent distance computations between samples. We refine these distances by incorporating context and grouping them to capture complementary information. Our method is robust across both tasks, with different data modalities, and outperforms state-of-the-art methods with a fully-unsupervised solution without any labeling or human intervention.

下载PDF全文

下载文献需遵守相关版权规定

论文标题