三胞胎熵损失：改善简短语言识别系统的概括

论文标题

三胞胎熵损失：改善简短语言识别系统的概括

Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems

论文作者

van der Merwe, Ruan

论文摘要

我们提出了几种方法，将语言识别（LID）系统的概括（盖）系统推向新的说话者和新领域。这些方法涉及光谱增强，其中在训练和CNN架构中，在Imagenet数据集中预先训练的频谱图和时间频段掩盖了。该论文还介绍了新型的三重三重熵损失训练方法，该方法涉及使用交叉熵和三胞胎损失同时训练网络。发现所有三种方法都改善了模型的概括，尽管并非显着。即使使用三重距熵损失训练的模型显示出对语言和更高准确性的更好理解，但似乎这些模型仍然记住频谱图中存在的单词模式，而不是学习语言的细微差别。研究表明，三胞胎熵损失具有巨大的潜力，不仅应在语言识别任务，而且在任何分类任务中进行进一步研究。

We present several methods to improve the generalisation of language identification (LID) systems to new speakers and to new domains. These methods involve Spectral augmentation, where spectrograms are masked in the frequency or time bands during training and CNN architectures that are pre-trained on the Imagenet dataset. The paper also introduces the novel Triplet Entropy Loss training method, which involves training a network simultaneously using Cross Entropy and Triplet loss. It was found that all three methods improved the generalisation of the models, though not significantly. Even though the models trained using Triplet Entropy Loss showed a better understanding of the languages and higher accuracies, it appears as though the models still memorise word patterns present in the spectrograms rather than learning the finer nuances of a language. The research shows that Triplet Entropy Loss has great potential and should be investigated further, not only in language identification tasks but any classification task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题