通过令牌级的训练损失，增强基于变形金刚的扬声器变压器的扬声器变更检测

论文标题

通过令牌级的训练损失，增强基于变形金刚的扬声器变压器的扬声器变更检测

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

论文作者

Zhao, Guanlong, Wang, Quan, Lu, Han, Huang, Yiling, Moreno, Ignacio Lopez

论文摘要

在这项工作中，我们提出了一种基于令牌的新型培训策略，该策略可改善基于变形金刚的变压器 - 基于扬声器（T-T）的扬声器变更检测（SCD）性能。常规的基于T-T的SCD模型损耗平均优化了所有输出令牌。由于训练数据中说话者的稀疏性变化，因此基于T-T的常规SCD模型损耗导致了优化的检测准确性。为了减轻此问题，我们使用自定义的编辑距离算法来估计令牌级别的SCD false Access（FA）和误拒绝（FR）率（FR）速率（FR）速率，并优化模型参数，以最大程度地减少FA和FR的加权组合，从而将模型的重点放在准确预测扬声器更改上。我们还提出了一组评估指标，这些指标与商业用例更好地保持一致。在一组挑战的现实世界数据集上进行的实验表明，提出的训练方法可以显着提高SCD模型的总体性能，并具有相同数量的参数。

In this work we propose a novel token-based training strategy that improves Transformer-Transducer (T-T) based speaker change detection (SCD) performance. The conventional T-T based SCD model loss optimizes all output tokens equally. Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy. To mitigate this issue, we use a customized edit-distance algorithm to estimate the token-level SCD false accept (FA) and false reject (FR) rates during training and optimize model parameters to minimize a weighted combination of the FA and FR, focusing the model on accurately predicting speaker changes. We also propose a set of evaluation metrics that align better with commercial use cases. Experiments on a group of challenging real-world datasets show that the proposed training method can significantly improve the overall performance of the SCD model with the same number of parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题