论文标题
端到端的扬声器诊断作为后处理
End-to-End Speaker Diarization as Post-Processing
论文作者
论文摘要
本文研究了端到端诊断模型的利用,作为基于常规聚类的诊断的后处理。基于聚类的诊断方法将框架分为说话者数量的簇;因此,他们通常无法处理重叠的语音,因为每个帧都分配给一个扬声器。另一方面,某些端到端诊断方法可以通过将问题视为多标签分类来处理重叠的语音。尽管某些方法可以治疗灵活数量的扬声器,但是当扬声器数量较大时,它们的表现不佳。为了补偿彼此的弱点,我们建议使用两扬式端到端诊断方法作为通过基于聚类的方法获得的结果的后处理。我们从结果中迭代选择两个扬声器,并更新两个扬声器的结果以改善重叠区域。实验结果表明,所提出的算法始终提高了Callhome,AMI和Dihard II数据集的最先进方法的性能。
This paper investigates the utilization of an end-to-end diarization model as post-processing of conventional clustering-based diarization. Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker. On the other hand, some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification. Although some methods can treat a flexible number of speakers, they do not perform well when the number of speakers is large. To compensate for each other's weakness, we propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method. We iteratively select two speakers from the results and update the results of the two speakers to improve the overlapped region. Experimental results show that the proposed algorithm consistently improved the performance of the state-of-the-art methods across CALLHOME, AMI, and DIHARD II datasets.