伪较大的标签，用于弱监督音频标签

论文标题

伪较大的标签，用于弱监督音频标签

Pseudo strong labels for large scale weakly supervised audio tagging

论文作者

Dinkel, Heinrich, Yan, Zhiyong, Wang, Yongqing, Zhang, Junbo, Wang, Yujun

论文摘要

大规模的音频标记数据集不可避免地包含不完美的标签，例如夹子的注释（暂时弱）标签，由于高手动标记成本，因此没有确切的偏置和偏移量。这项工作提出了伪标签（PSL），这是一个简单的标签增强框架，可提高大规模弱监督音频标签的监督质量。首先对机器注释者进行了大型弱监督数据集的培训，然后为学生模型提供了更好的监督。使用PSL，我们使用MobileNetV2后端获得了35.95均衡的登录仪表集的图，在没有PSL的情况下大大超过了表现的方法。提供了一个分析，该分析表明PSL会减轻缺失的标签。最后，我们表明，接受PSL训练的模型在推广到自由数据集（FSD）方面也比训练有素较弱的对应物优越。

Large-scale audio tagging datasets inevitably contain imperfect labels, such as clip-wise annotated (temporally weak) tags with no exact on- and offsets, due to a high manual labeling cost. This work proposes pseudo strong labels (PSL), a simple label augmentation framework that enhances the supervision quality for large-scale weakly supervised audio tagging. A machine annotator is first trained on a large weakly supervised dataset, which then provides finer supervision for a student model. Using PSL we achieve an mAP of 35.95 balanced train subset of Audioset using a MobileNetV2 back-end, significantly outperforming approaches without PSL. An analysis is provided which reveals that PSL mitigates missing labels. Lastly, we show that models trained with PSL are also superior at generalizing to the Freesound datasets (FSD) than their weakly trained counterparts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题