论文标题

与关系网络的在线演讲者诊断

Online Speaker Diarization with Relation Network

论文作者

Li, Xiang, Zhao, Yucheng, Luo, Chong, Zeng, Wenjun

论文摘要

在本文中,我们提出了一个基于关系网络的在线演讲者诊断系统,名为Renosd。与传统的腹泻系统不同,该系统由几个独立优化的模块组成,Renosd使用单个深神经网络实现语音攻击性检测(VAD),嵌入提取和说话者身份。 Renosd最引人注目的特征是,它为说话者身份协会采用了元学习策略。特别是,关系网络学会以数据驱动的方式学习深距离度量,并且可以通过简单的向前传递确定两个给定的段是否属于同一说话者。因此,Renosd可以以低延迟的方式在线进行。对AMI和Callhome数据集的实验结果表明,所提出的RENOSD系统比最新的X-Vector基线实现了一致的改进。与名为UIS-RNN的现有在线诊断系统相比,Renosd使用较少的培训数据和较低的时间复杂性来取得更好的性能。

In this paper, we propose an online speaker diarization system based on Relation Network, named RenoSD. Unlike conventional diariztion systems which consist of several independently-optimized modules, RenoSD implements voice-activity-detection (VAD), embedding extraction, and speaker identity association using a single deep neural network. The most striking feature of RenoSD is that it adopts a meta-learning strategy for speaker identity association. In particular, the relation network learns to learn a deep distance metric in a data-driven way and it can determine through a simple forward pass whether two given segments belong to the same speaker. As such, RenoSD can be performed in an online manner with low latency. Experimental results on AMI and CALLHOME datasets show that the proposed RenoSD system achieves consistent improvements over the state-of-the-art x-vector baseline. Compared with an existing online diarization system named UIS-RNN, RenoSD achieves a better performance using much fewer training data and at a lower time complexity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源