论文标题

TRSCORE:一种基于GPT的新型可读性得分手,用于ASR分割和标点符号模型评估和选择

TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection

论文作者

Behre, Piyush, Tan, Sharman, Shah, Amy, Kesavamoorthy, Harini, Chang, Shuangyu, Zuo, Fei, Basoglu, Chris, Pathak, Sayan

论文摘要

标点符号和分割是自动语音识别(ASR)可读性的关键,通常使用需要高质量人类笔录的F1评分进行评估,并且不能很好地反映可读性。人类评估很昂贵,耗时,并且患有大量观察者的变异性,尤其是在没有严格的语法结构的会话语音中。大型预训练模型捕获了语法结构的概念。我们提出了TRSCORE,这是一种使用GPT模型来评估不同分段和标点系统的新型可读性度量。我们验证了人类专家的方法。此外,我们的方法还可以对文本后处理技术进行定量评估,例如大写,文本归一化(ITN)和整体可读性的反应,这些通用单词错误率(WER)和插槽错误率(SER)指标无法捕获。 TRSCORE与传统的F1和人类可读性得分密切相关,Pearson的相关系数分别为0.67和0.98。它还消除了对模型选择的人类转录的需求。

Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源