论文标题

使用矩形滤波器和通道旋转数据增强基于CRNN的声音事件定位和检测

Sound event localization and detection based on crnn using rectangular filters and channel rotation data augmentation

论文作者

Ronchini, Francesca, Arteaga, Daniel, Pérez-López, Andrés

论文摘要

声音事件的定位和检测是指识别独立或时间段的声音源的存在的问题,正确识别其属于哪个声音类,并在它们处于活动状态时估算它们的空间方向。在过去的几年中,神经网络已成为声音事件定位和检测任务的流行方法,卷积复发性神经网络是最常用的系统之一。本文介绍了一种对声学场景和事件的检测和分类2020挑战任务3的系统。该算法由使用矩形过滤器的卷积复发性神经网络组成,专门识别与该任务相关的重要光谱特征。为了进一步提高分数并概括了系统性能以使数据看不见数据,使用数据增强,训练数据集的大小已增加。用于该技术的技术是基于通道旋转和在一阶Ambisonic域中对XY平面上的反射,这允许改善到达标签的方向,以保持通道之间的物理关系。开发数据集的评估结果表明,所提出的系统优于基线结果,可大大提高错误率和位置感知检测的F评分。

Sound Event Localization and Detection refers to the problem of identifying the presence of independent or temporally-overlapped sound sources, correctly identifying to which sound class it belongs, estimating their spatial directions while they are active. In the last years, neural networks have become the prevailing method for sound Event Localization and Detection task, with convolutional recurrent neural networks being among the most used systems. This paper presents a system submitted to the Detection and Classification of Acoustic Scenes and Events 2020 Challenge Task 3. The algorithm consists of a convolutional recurrent neural network using rectangular filters, specialized in recognizing significant spectral features related to the task. In order to further improve the score and to generalize the system performance to unseen data, the training dataset size has been increased using data augmentation. The technique used for that is based on channel rotations and reflection on the xy plane in the First Order Ambisonic domain, which allows improving Direction of Arrival labels keeping the physical relationships between channels. Evaluation results on the development dataset show that the proposed system outperforms the baseline results, considerably improving Error Rate and F-score for location-aware detection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源