论文标题

DCASE 2022任务4的声音事件检测变压器和框架模型的混合系统系统4

A Hybrid System of Sound Event Detection Transformer and Frame-wise Model for DCASE 2022 Task 4

论文作者

Li, Yiming, Guo, Zhifang, Ye, Zhirong, Wang, Xiangdong, Liu, Hong, Qian, Yueliang, Tao, Rui, Yan, Long, Ouchi, Kazushige

论文摘要

在本文中,我们详细描述了Dcase 2022 Task4的系统4。该系统结合了两个相当不同的模型:端到端声音事件检测变压器(SEDT)和框架模型,公制学习和焦点损失CNN(MLFL-CNN)。前者是事件的模型,它可以直接学习事件级表示形式并直接预测声音事件类别和边界,而后者则基于广泛采用的框架分类方案,根据该方案,每个帧都将每个帧分类到事件类别中,并通过在诸如阈值和平滑之类的后处理中获得事件类别和事件边界。对于SEDT,应用了使用未标记的数据的自我监管的预训练,并通过使用在线教师采用半监督的学习,该学习是通过在线教师使用指数移动平均(EMA)策略从学生模型中更新的,并生成可靠的伪标签以进行虚弱标记的标签和未标记的数据。对于框架模型,使用DCASE 2021任务4的ICT-Toshiba系统。实验结果表明,在没有外部数据的验证集上,混合系统的表现要胜过单个模型,并在验证集上实现0.420的PSD1和0.783的PSDS1。该代码可在https://github.com/965694547/hybrid-system-frame-wise-model-and-sedt上获得。

In this paper, we describe in detail our system for DCASE 2022 Task4. The system combines two considerably different models: an end-to-end Sound Event Detection Transformer (SEDT) and a frame-wise model, Metric Learning and Focal Loss CNN (MLFL-CNN). The former is an event-wise model which learns event-level representations and predicts sound event categories and boundaries directly, while the latter is based on the widely adopted frame-classification scheme, under which each frame is classified into event categories and event boundaries are obtained by post-processing such as thresholding and smoothing. For SEDT, self-supervised pre-training using unlabeled data is applied, and semi-supervised learning is adopted by using an online teacher, which is updated from the student model using the Exponential Moving Average (EMA) strategy and generates reliable pseudo labels for weakly-labeled and unlabeled data. For the frame-wise model, the ICT-TOSHIBA system of DCASE 2021 Task 4 is used. Experimental results show that the hybrid system considerably outperforms either individual model and achieves psds1 of 0.420 and psds2 of 0.783 on the validation set without external data. The code is available at https://github.com/965694547/Hybrid-system-of-frame-wise-model-and-SEDT.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源