在没有语音数据的情况下，在高度级别的失情场景中构建ASR错误强大的口语虚拟患者系统

论文标题

在没有语音数据的情况下，在高度级别的失情场景中构建ASR错误强大的口语虚拟患者系统

Building an ASR Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data

论文作者

Sunder, Vishal, Serai, Prashant, Fosler-Lussier, Eric

论文摘要

虚拟患者（VP）是培训医学生培养患者历史的强大工具，在这种情况下，回答各种各样的口语问题对于模拟与学生的自然对话至关重要。这种口语理解系统（SLU）的性能可能会受到测试数据中自动语音识别（ASR）错误的存在和SLU培训数据中高度的类不平衡的影响。尽管这两个问题已在先前的工作中分别解决，但我们开发了一种新颖的两步培训方法，该方法在单个对话框中有效地解决了这两个问题。由于很难在没有运行的SLU系统的情况下从用户那里收集口头数据，因此我们的方法不依赖口语数据进行培训，而是使用ASR错误预测器来“语音化”文本数据。我们的方法在各种单词错误率设置下对强大基准的强大基线显示出明显的改进。

A Virtual Patient (VP) is a powerful tool for training medical students to take patient histories, where responding to a diverse set of spoken questions is essential to simulate natural conversations with a student. The performance of such a Spoken Language Understanding system (SLU) can be adversely affected by both the presence of Automatic Speech Recognition (ASR) errors in the test data and a high degree of class imbalance in the SLU training data. While these two issues have been addressed separately in prior work, we develop a novel two-step training methodology that tackles both these issues effectively in a single dialog agent. As it is difficult to collect spoken data from users without a functioning SLU system, our method does not rely on spoken data for training, rather we use an ASR error predictor to "speechify" the text data. Our method shows significant improvements over strong baselines on the VP intent classification task at various word error rate settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题