强大的声学域识别及其在扬声器诊断的应用

论文标题

强大的声学域识别及其在扬声器诊断的应用

Robust Acoustic Domain Identification with its Application to Speaker Diarization

论文作者

Kumar, A Kishore, Waldekar, Shefali, Sahidullah, Md, Saha, Goutam

论文摘要

多年来，随着多媒体含量的增加，在音频的录制环境中观察到了更多种类。当音频处理系统具有一个模块以识别其前端的声学域时，它可能会受益。在本文中，我们演示了\ emph {sakeer diarization}的\ emph {ocoustic域识别}（ADI）的概念。为此，我们首先介绍了第三迪哈德挑战的各个领域的详细研究，强调了彼此区分的因素。我们的主要贡献是为ADI开发一个简单有效的解决方案。在目前的工作中，我们探索了此任务的扬声器嵌入。接下来，我们将ADI模块与Dihard III挑战的扬声器诊断框架相结合。当根据各个域优化聚集层次聚类的阈值时，该性能显着改善了基线。在Dihard III评估集的轨道1上，我们的核心和完整条件分别取得了相对改善的$ 5 \％$和$ 8 \％$。

With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of \emph{acoustic domain identification} (ADI) for \emph{speaker diarization}. For this, we first present a detailed study of the various domains of the third DIHARD challenge highlighting the factors that differentiated them from each other. Our main contribution is to develop a simple and efficient solution for ADI. In the present work, we explore speaker embeddings for this task. Next, we integrate the ADI module with the speaker diarization framework of the DIHARD III challenge. The performance substantially improved over that of the baseline when the thresholds for agglomerative hierarchical clustering were optimized according to the respective domains. We achieved a relative improvement of more than $5\%$ and $8\%$ in DER for core and full conditions, respectively, on Track 1 of the DIHARD III evaluation set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题