论文标题

通过令牌级的训练损失,增强基于变形金刚的扬声器变压器的扬声器变更检测

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

论文作者

Zhao, Guanlong, Wang, Quan, Lu, Han, Huang, Yiling, Moreno, Ignacio Lopez

论文摘要

在这项工作中,我们提出了一种基于令牌的新型培训策略,该策略可改善基于变形金刚的变压器 - 基于扬声器(T-T)的扬声器变更检测(SCD)性能。常规的基于T-T的SCD模型损耗平均优化了所有输出令牌。由于训练数据中说话者的稀疏性变化,因此基于T-T的常规SCD模型损耗导致了优化的检测准确性。为了减轻此问题,我们使用自定义的编辑距离算法来估计令牌级别的SCD false Access(FA)和误拒绝(FR)率(FR)速率(FR)速率,并优化模型参数,以最大程度地减少FA和FR的加权组合,从而将模型的重点放在准确预测扬声器更改上。我们还提出了一组评估指标,这些指标与商业用例更好地保持一致。在一组挑战的现实世界数据集上进行的实验表明,提出的训练方法可以显着提高SCD模型的总体性能,并具有相同数量的参数。

In this work we propose a novel token-based training strategy that improves Transformer-Transducer (T-T) based speaker change detection (SCD) performance. The conventional T-T based SCD model loss optimizes all output tokens equally. Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy. To mitigate this issue, we use a customized edit-distance algorithm to estimate the token-level SCD false accept (FA) and false reject (FR) rates during training and optimize model parameters to minimize a weighted combination of the FA and FR, focusing the model on accurately predicting speaker changes. We also propose a set of evaluation metrics that align better with commercial use cases. Experiments on a group of challenging real-world datasets show that the proposed training method can significantly improve the overall performance of the SCD model with the same number of parameters.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源