概念：潜在概念分析的框架

论文标题

概念：潜在概念分析的框架

ConceptX: A Framework for Latent Concept Analysis

论文作者

Alam, Firoj, Dalvi, Fahim, Durrani, Nadir, Sajjad, Hassan, Khan, Abdul Rafae, Xu, Jia

论文摘要

深度神经网络的不透明度仍然是部署解释与精度重要的解决方案的挑战。我们提出了Conceptx，这是一个在预训练语言模型（PLM）中解释和注释潜在空间的人类框架框架。我们使用一种无监督的方法来发现这些模型中学到的概念，并为人类提供图形接口，以生成概念的解释。为了促进该过程，我们提供了概念的自动保管（基于传统语言本体论）。这样的注释可以开发一种语言资源，该语言资源直接代表了深度NLP模型中学到的潜在概念。这些不仅包括传统的语言概念，还包括特定于任务或敏感的概念（基于性别或宗教含义的单词），可以帮助注释者在模型中标记偏见。该框架由两个部分（i）概念发现和（ii）注释平台组成。

The opacity of deep neural networks remains a challenge in deploying solutions where explanation is as important as precision. We present ConceptX, a human-in-the-loop framework for interpreting and annotating latent representational space in pre-trained Language Models (pLMs). We use an unsupervised method to discover concepts learned in these models and enable a graphical interface for humans to generate explanations for the concepts. To facilitate the process, we provide auto-annotations of the concepts (based on traditional linguistic ontologies). Such annotations enable development of a linguistic resource that directly represents latent concepts learned within deep NLP models. These include not just traditional linguistic concepts, but also task-specific or sensitive concepts (words grouped based on gender or religious connotation) that helps the annotators to mark bias in the model. The framework consists of two parts (i) concept discovery and (ii) annotation platform.

下载PDF全文

下载文献需遵守相关版权规定

论文标题