论文标题
雪貂:基准在变压器上进行解释器的框架
ferret: a Framework for Benchmarking Explainers on Transformers
论文作者
论文摘要
由于变压器越来越依赖于解决复杂的NLP问题,因此他们的决策的需求越来越大。尽管已经提出了几种可解释的AI(XAI)技术来解释基于变压器模型的输出的技术,但仍然缺乏轻松访问和比较它们。我们介绍了Python库Ferret,以简化基于变压器分类器的XAI方法的使用和比较。使用雪貂,用户可以在任何自由文本或现有XAI Corpora上使用最新的XAI方法可视化和比较基于变形金刚的模型输出说明。此外,用户还可以评估临时XAI指标,以选择最忠实,最合理的解释。为了与最近合并的共享和使用基于变形金刚的模型相符,雪貂接口直接与其Python库。在本文中,我们展示了雪貂用于在变压器上用于情感分析和仇恨言语检测的基准XAI方法。我们展示了特定方法如何提供更好的解释,并且在变压器模型的背景下是可取的。
As Transformers are increasingly relied upon to solve complex NLP problems, there is an increased need for their decisions to be humanly interpretable. While several explainable AI (XAI) techniques for interpreting the outputs of transformer-based models have been proposed, there is still a lack of easy access to using and comparing them. We introduce ferret, a Python library to simplify the use and comparisons of XAI methods on transformer-based classifiers. With ferret, users can visualize and compare transformers-based models output explanations using state-of-the-art XAI methods on any free-text or existing XAI corpora. Moreover, users can also evaluate ad-hoc XAI metrics to select the most faithful and plausible explanations. To align with the recently consolidated process of sharing and using transformers-based models from Hugging Face, ferret interfaces directly with its Python library. In this paper, we showcase ferret to benchmark XAI methods used on transformers for sentiment analysis and hate speech detection. We show how specific methods provide consistently better explanations and are preferable in the context of transformer models.