WebFace260m：数百万尺度深脸识别的基准

论文标题

WebFace260m：数百万尺度深脸识别的基准

WebFace260M: A Benchmark for Million-Scale Deep Face Recognition

论文作者

Zhu, Zheng, Huang, Guan, Deng, Jiankang, Ye, Yun, Huang, Junjie, Chen, Xinze, Zhu, Jiagang, Yang, Tian, Du, Dalong, Lu, Jiwen, Zhou, Jie

论文摘要

面部基准使研究界有能力培训和评估高性能的面部识别系统。在本文中，我们贡献了一个新的百万级识别基准，其中包含未经修复的400万个身份/260m面（WebFace2.60亿），并清洁了200万个身份/42m面孔（WebFace42M）培训数据，以及设计经过精制设计的时间约束评估协议。首先，我们从Internet收集4M名称列表，并从Internet下载260m的面孔。然后，设计自动训练（Cast）管道的清洁工作被设计为净化巨大的Webface260m，这是有效且可扩展的。据我们所知，清洁的WebFace42M是最大的公共面部识别培训集，我们希望缩小学术界和行业之间的数据差距。参考实际部署，推理时间约束（水果）协议下的面部识别以及构建具有丰富属性的新测试集。此外，我们收集了一个大规模的蒙面面部子集，以根据Covid-19，用于生物识别评估。为了全面评估面部匹配者，分别根据标准，掩盖和公正的设置执行三个识别任务。配备了这个基准，我们深入研究了百万尺度的面部识别问题。开发了一个分布式框架，以有效地训练面部识别模型，而无需篡改性能。由WebFace 42M启用，我们在具有挑战性的IJB-C集合中降低了40％的故障率，在NIST-FRVT上的430个条目中排名第三。与公共培训集相比，甚至10％的数据（WebFace4M）也显示出卓越的性能。此外，在水果100/500/1000毫秒方案下建立了综合基线。提出的基准测试在标准，掩盖和公正的面部识别方案上显示出巨大的潜力。我们的WebFace260M网站是https://www.face-benchmark.org。

Face benchmarks empower the research community to train and evaluate high-performance face recognition systems. In this paper, we contribute a new million-scale recognition benchmark, containing uncurated 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol. Firstly, we collect 4M name lists and download 260M faces from the Internet. Then, a Cleaning Automatically utilizing Self-Training (CAST) pipeline is devised to purify the tremendous WebFace260M, which is efficient and scalable. To the best of our knowledge, the cleaned WebFace42M is the largest public face recognition training set and we expect to close the data gap between academia and industry. Referring to practical deployments, Face Recognition Under Inference Time conStraint (FRUITS) protocol and a new test set with rich attributes are constructed. Besides, we gather a large-scale masked face sub-set for biometrics assessment under COVID-19. For a comprehensive evaluation of face matchers, three recognition tasks are performed under standard, masked and unbiased settings, respectively. Equipped with this benchmark, we delve into million-scale face recognition problems. A distributed framework is developed to train face recognition models efficiently without tampering with the performance. Enabled by WebFace42M, we reduce 40% failure rate on the challenging IJB-C set and rank 3rd among 430 entries on NIST-FRVT. Even 10% data (WebFace4M) shows superior performance compared with the public training sets. Furthermore, comprehensive baselines are established under the FRUITS-100/500/1000 milliseconds protocols. The proposed benchmark shows enormous potential on standard, masked and unbiased face recognition scenarios. Our WebFace260M website is https://www.face-benchmark.org.

下载PDF全文

下载文献需遵守相关版权规定

论文标题