使用Ghost-Vlad Poling识别印度语言

论文标题

使用Ghost-Vlad Poling识别印度语言

Identification of Indian Languages using Ghost-VLAD pooling

论文作者

N, Krishna D, Patil, Ankita, Raj, M. S. P, S, Sai Prasad H, Garapati, Prabhu Aashish

论文摘要

在这项工作中，我们通过考虑印度语言提出了一种新的汇集策略，以实现语言识别。这个想法是为了获得任何可变长度音频的语音级别功能，以供稳健的语言识别。我们使用ghostvlad方法来为任何可变的长度输入音频生成话语级别的特征向量，通过跨时间汇总本地框架级别的特征。生成的特征向量显示具有很好的语言判别特征，并有助于获得最新的语言标识任务结果。我们对7种印度语言的635小时音频数据进行了实验。我们的方法的表现优于先前的ART X-Vector [11]方法，方法在F1得分中的绝对提高1.88％，并在持有的测试数据上达到98.43％的F1得分。我们将系统与各种汇总方法进行比较，并表明GhostVlad是完成此任务的最佳合并方法。我们还提供了使用Ghost-Vlad Pooling生成的话语级别嵌入的可视化，并表明此方法创建具有很好的语言歧视特征的嵌入。

In this work, we propose a new pooling strategy for language identification by considering Indian languages. The idea is to obtain utterance level features for any variable length audio for robust language recognition. We use the GhostVLAD approach to generate an utterance level feature vector for any variable length input audio by aggregating the local frame level features across time. The generated feature vector is shown to have very good language discriminative features and helps in getting state of the art results for language identification task. We conduct our experiments on 635Hrs of audio data for 7 Indian languages. Our method outperforms the previous state of the art x-vector [11] method by an absolute improvement of 1.88% in F1-score and achieves 98.43% F1-score on the held-out test data. We compare our system with various pooling approaches and show that GhostVLAD is the best pooling approach for this task. We also provide visualization of the utterance level embeddings generated using Ghost-VLAD pooling and show that this method creates embeddings which has very good language discriminative features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题