稀有声学事件分类的LSTM模型上合并方法的比较

论文标题

稀有声学事件分类的LSTM模型上合并方法的比较

A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification

论文作者

Kao, Chieh-Chi, Sun, Ming, Wang, Weiran, Wang, Chao

论文摘要

声学事件分类（AEC）和声学事件检测（AED）是指检测到音频中是否发生特定目标事件的任务。长期的短期记忆（LSTM）会导致最新导致各种与语音相关的任务，它也被用作AEC的流行解决方案。本文着重于研究AEC任务的LSTM模型的动力学。它包括有关LSTM存储器保留的详细分析，以及使用具有不同信噪比的多个事件生成的混合夹在LSTM模型上对LSTM模型进行的9种不同合并方法的基准测试。本文侧重于理解：1）话语级分类精度； 2）对话语中事件位置的敏感性。该分析是在数据集上进行的，以检测Dcase 2017 Challenge的罕见声音事件。我们发现，在预测级别上，最大池在分类准确性和对话语中事件位置的不敏感方面在九种合并方法中表现最好。据作者的最佳知识而言，这是针对AEC任务的LSTM动力学的第一个此类工作。

Acoustic event classification (AEC) and acoustic event detection (AED) refer to the task of detecting whether specific target events occur in audios. As long short-term memory (LSTM) leads to state-of-the-art results in various speech related tasks, it is employed as a popular solution for AEC as well. This paper focuses on investigating the dynamics of LSTM model on AEC tasks. It includes a detailed analysis on LSTM memory retaining, and a benchmarking of nine different pooling methods on LSTM models using 1.7M generated mixture clips of multiple events with different signal-to-noise ratios. This paper focuses on understanding: 1) utterance-level classification accuracy; 2) sensitivity to event position within an utterance. The analysis is done on the dataset for the detection of rare sound events from DCASE 2017 Challenge. We find max pooling on the prediction level to perform the best among the nine pooling approaches in terms of classification accuracy and insensitivity to event position within an utterance. To authors' best knowledge, this is the first kind of such work focused on LSTM dynamics for AEC tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题