论文标题
MBI-NET:助听器的非侵入性多分支语音可理解预测模型
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids
论文作者
论文摘要
提高用户在嘈杂环境中理解语音的听力能力对于助听器设备的开发至关重要。为此,重要的是要获得一个可以公平预测HA用户语音清晰度的度量。一种直接的方法是进行主观听力测试,并将测试结果用作评估度量。但是,进行大规模的听力测试是耗时且昂贵的。因此,将几个评估指标得出作为主观听力测试结果的替代物。在这项研究中,我们提出了一个多支链语音可理解性预测模型(MBI-NET),以预测HA用户的主观可理解性评分。 MBI-NET由两个模型分支组成,每个分支由听力损失模型,跨域特征提取模块和语音清晰度预测模型组成,以从一个通道中处理语音信号。两个分支的输出通过线性层融合,以获得预测的语音清晰度分数。实验结果证实了MBI-NET的有效性,MBI-NET的预测得分比轨道1中的基线系统和曲目2中的基线系统更高,而在“清晰度预测挑战2022数据集”中产生的预测分数更高。
Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A straightforward approach is to conduct a subjective listening test and use the test results as an evaluation metric. However, conducting large-scale listening tests is time-consuming and expensive. Therefore, several evaluation metrics were derived as surrogates for subjective listening test results. In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users. MBI-Net consists of two branches of models, with each branch consisting of a hearing loss model, a cross-domain feature extraction module, and a speech intelligibility prediction model, to process speech signals from one channel. The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores. Experimental results confirm the effectiveness of MBI-Net, which produces higher prediction scores than the baseline system in Track 1 and Track 2 on the Clarity Prediction Challenge 2022 dataset.