论文标题

使用声学场景分类中的深层建筑捕获分散的判别信息

Capturing scattered discriminative information using a deep architecture in acoustic scene classification

论文作者

Shim, Hye-jin, Jung, Jee-weon, Kim, Ju-ho, Yu, Ha-jin

论文摘要

在声学场景分类(ASC)中,经常错误分类的类具有共享许多常见声学特性的类。为了区分这样的类别,散布在整个数据中的琐碎细节可能是重要的线索。但是,这些细节不太明显,并且使用常规的非线性激活(例如Relu)轻松去除。此外,如果系统没有充分概括,则做出设计选择以强调琐碎的细节很容易导致过度拟合。在这项研究中,根据对ASC任务特征的分析,我们研究了各种捕获歧视性信息并同时减轻过度拟合问题的方法。我们采用一种最大特征图方法来替换深神经网络中传统的非线性激活,因此,我们在卷积层输出的不同过滤器之间进行了元素的比较。进一步探索了两种数据增强方法和两个深度体系结构模块,以减少过度拟合并维持系统的判别能力。使用声学场景和事件2020 Task1-A数据集的检测和分类进行了各种实验,以验证所提出的方法。我们的结果表明,所提出的系统始终优于基线,在基线中,单个最佳性能系统的精度为70.4%,而基线的65.1%。

Frequently misclassified pairs of classes that share many common acoustic properties exist in acoustic scene classification (ASC). To distinguish such pairs of classes, trivial details scattered throughout the data could be vital clues. However, these details are less noticeable and are easily removed using conventional non-linear activations (e.g. ReLU). Furthermore, making design choices to emphasize trivial details can easily lead to overfitting if the system is not sufficiently generalized. In this study, based on the analysis of the ASC task's characteristics, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem. We adopt a max feature map method to replace conventional non-linear activations in a deep neural network, and therefore, we apply an element-wise comparison between different filters of a convolution layer's output. Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power. Various experiments are conducted using the detection and classification of acoustic scenes and events 2020 task1-a dataset to validate the proposed methods. Our results show that the proposed system consistently outperforms the baseline, where the single best performing system has an accuracy of 70.4% compared to 65.1% of the baseline.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源