深度学习培训中的概念进化：统一的解释框架和发现

论文标题

深度学习培训中的概念进化：统一的解释框架和发现

Concept Evolution in Deep Learning Training: A Unified Interpretation Framework and Discoveries

论文作者

Park, Haekyu, Lee, Seongmin, Hoover, Benjamin, Wright, Austin P., Shaikh, Omar, Duggal, Rahul, Das, Nilaksh, Li, Kevin, Hoffman, Judy, Chau, Duen Horng

论文摘要

我们提出了ConceptEvo，这是一个深层神经网络（DNN）的统一解释框架，揭示了训练过程中学到的概念的开始和演变。我们的工作解决了DNN解释研究中的关键差距，因为现有方法主要集中在培训后解释上。 ConceptEvo引入了两个新的技术贡献：（1）一种算法，该算法生成统一的语义空间，在培训过程中对不同模型进行并排比较，以及（2）一种发现和量化集体预测的重要概念发展的算法。通过大规模的人类评估和定量实验，我们证明了ConceptEvo成功地识别了不同模型之间的概念演变，这些模型不仅对人类来说是可理解的，而且对班级预测至关重要。 ConceptEvo适用于Convnext和Classic DNN等现代DNN架构，例如VGGS和IntectionV3。

We present ConceptEvo, a unified interpretation framework for deep neural networks (DNNs) that reveals the inception and evolution of learned concepts during training. Our work addresses a critical gap in DNN interpretation research, as existing methods primarily focus on post-training interpretation. ConceptEvo introduces two novel technical contributions: (1) an algorithm that generates a unified semantic space, enabling side-by-side comparison of different models during training, and (2) an algorithm that discovers and quantifies important concept evolutions for class predictions. Through a large-scale human evaluation and quantitative experiments, we demonstrate that ConceptEvo successfully identifies concept evolutions across different models, which are not only comprehensible to humans but also crucial for class predictions. ConceptEvo is applicable to both modern DNN architectures, such as ConvNeXt, and classic DNNs, such as VGGs and InceptionV3.

下载PDF全文

下载文献需遵守相关版权规定

论文标题