论文标题
电影视频中强大的角色标签:数据资源和自我监督的功能适应
Robust Character Labeling in Movie Videos: Data Resources and Self-supervised Feature Adaptation
论文作者
论文摘要
健壮的面部聚类是使媒体中对视觉字符刻画的计算理解的至关重要的一步。由于外观的变化和缺乏支持大规模标记的数据的变化,长形式含量的面部聚类具有挑战性。我们在本文中的工作重点介绍了此问题的两个关键方面:缺乏特定领域的培训或基准数据集,以及在Web图像上学到的对面部嵌入的调整到长形式的内容,尤其是电影。首先,我们介绍了一个数据集,其中包括169,000多个好莱坞电影中策划的,标签较弱,介绍了一对面部曲目是属于同一或另一个角色。我们根据嵌入空间中最近的邻居搜索提出了一个离线算法,以从这些轨道中开采硬示例。然后,我们研究了基于三联损伤和多视图相关的方法,用于将面部嵌入到硬示例中。我们的实验结果突出了弱标记数据对域特异性特征适应的有用性。总体而言,我们发现基于多视图相关的适应性会产生更具歧视性和健壮的面部嵌入。它在下游面部验证和聚类任务上的性能与该领域的最新结果相当。我们还介绍了开发的帆电影基准语料库,以增强现有基准。它由种族多样的参与者组成,并提供面部质量标签,用于随后的错误分析。我们希望这项工作中开发的大型数据集可以进一步推进视频中的自动角色标签。所有资源均可在https://sail.usc.edu/~ccmi/multiface免费获得。
Robust face clustering is a vital step in enabling computational understanding of visual character portrayal in media. Face clustering for long-form content is challenging because of variations in appearance and lack of supporting large-scale labeled data. Our work in this paper focuses on two key aspects of this problem: the lack of domain-specific training or benchmark datasets, and adapting face embeddings learned on web images to long-form content, specifically movies. First, we present a dataset of over 169,000 face tracks curated from 240 Hollywood movies with weak labels on whether a pair of face tracks belong to the same or a different character. We propose an offline algorithm based on nearest-neighbor search in the embedding space to mine hard-examples from these tracks. We then investigate triplet-loss and multiview correlation-based methods for adapting face embeddings to hard-examples. Our experimental results highlight the usefulness of weakly labeled data for domain-specific feature adaptation. Overall, we find that multiview correlation-based adaptation yields more discriminative and robust face embeddings. Its performance on downstream face verification and clustering tasks is comparable to that of the state-of-the-art results in this domain. We also present the SAIL-Movie Character Benchmark corpus developed to augment existing benchmarks. It consists of racially diverse actors and provides face-quality labels for subsequent error analysis. We hope that the large-scale datasets developed in this work can further advance automatic character labeling in videos. All resources are available freely at https://sail.usc.edu/~ccmi/multiface.