使用大型预训练的语言模型来协助FDA在上市医疗设备中

论文标题

使用大型预训练的语言模型来协助FDA在上市医疗设备中

Using Large Pre-Trained Language Model to Assist FDA in Premarket Medical Device

论文作者

Xu, Zongzhe

论文摘要

本文提出了一种使用自然语言处理的可能方法，该方法可能有助于FDA医疗设备营销过程。实际设备描述与CFR的FDA标题21中的设备描述进行了匹配，以确定其相应的设备类型。在表征它们在表征一部设备描述时的准确性上，评估了两个预训练的单词嵌入诸如FastText和大型预训练式嵌入模型（例如句子变形金刚）的嵌入。还进行了一个实验来测试这些模型是否可以识别FDA数据库中错误分类的设备。结果表明，带有T5和MPNET和GPT-3语义搜索嵌入的句子变压器通过缩小在前15个最有可能的结果中包含的正确标签来识别正确的分类，与2585种必须手动搜索的设备描述相比，在识别正确的分类方面表现出很高的精度。另一方面，所有方法在识别完全错误标记的设备方面都表现出很高的精度，但是所有方法都无法识别错误的设备分类，这些分类是错误的，但与True标签密切相关。

This paper proposes a possible method using natural language processing that might assist in the FDA medical device marketing process. Actual device descriptions are taken and matched with the device description in FDA Title 21 of CFR to determine their corresponding device type. Both pre-trained word embeddings such as FastText and large pre-trained sentence embedding models such as sentence transformers are evaluated on their accuracy in characterizing a piece of device description. An experiment is also done to test whether these models can identify the devices wrongly classified in the FDA database. The result shows that sentence transformer with T5 and MPNet and GPT-3 semantic search embedding show high accuracy in identifying the correct classification by narrowing down the correct label to be contained in the first 15 most likely results, as compared to 2585 types of device descriptions that must be manually searched through. On the other hand, all methods demonstrate high accuracy in identifying completely incorrectly labeled devices, but all fail to identify false device classifications that are wrong but closely related to the true label.

下载PDF全文

下载文献需遵守相关版权规定

论文标题