视觉清单：迈向图像模型的可测试错误分析，以帮助系统设计师询问模型功能

论文标题

视觉清单：迈向图像模型的可测试错误分析，以帮助系统设计师询问模型功能

Vision Checklist: Towards Testable Error Analysis of Image Models to Help System Designers Interrogate Model Capabilities

论文作者

Du, Xin, Legastelois, Benedicte, Ganesh, Bhargavi, Rajan, Ajitha, Chockler, Hana, Belle, Vaishak, Anderson, Stuart, Ramamoorthy, Subramanian

论文摘要

由于视觉变形金刚和其他基于CNN的模型（如VGG和Resnet）的公认成功，使用大型预训练模型进行图像识别任务变得越来越普遍。这些模型在基准任务上的高精度已转化为它们在许多领域的实际使用，包括自主驾驶和医疗诊断等安全性应用程序。尽管它们广泛使用，但图像模型已被证明对操作环境的变化脆弱，从而使它们的稳健性构成质疑。迫切需要系统地表征和量化这些模型能力以帮助设计师理解并提供有关其安全性和鲁棒性的保证的方法。在本文中，我们提出了视觉清单，该清单旨在询问模型的功能，以生成系统设计人员可以用于鲁棒性评估的报告。该框架提出了一组扰动操作，可以应用于基础数据以生成不同类型的测试样本。扰动反映了操作环境的潜在变化，并询问从严格定量到更定性的各种属性。我们的框架在多个数据集上进行了评估，例如Tinyimagenet，Cifar10，Cifar100和Camelyon17，以及Vit和Resnet等模型。我们的视觉清单提出了一组特定的评估，可以将其集成到先前提出的模型卡的概念中。诸如我们的清单之类的鲁棒性评估对于视觉感知模块的未来安全评估至关重要，并且对于包括设计师，部署者和参与这些系统认证的监管机构在内的广泛利益相关者非常有用。源视觉清单的源代码将开放供公众使用。

Using large pre-trained models for image recognition tasks is becoming increasingly common owing to the well acknowledged success of recent models like vision transformers and other CNN-based models like VGG and Resnet. The high accuracy of these models on benchmark tasks has translated into their practical use across many domains including safety-critical applications like autonomous driving and medical diagnostics. Despite their widespread use, image models have been shown to be fragile to changes in the operating environment, bringing their robustness into question. There is an urgent need for methods that systematically characterise and quantify the capabilities of these models to help designers understand and provide guarantees about their safety and robustness. In this paper, we propose Vision Checklist, a framework aimed at interrogating the capabilities of a model in order to produce a report that can be used by a system designer for robustness evaluations. This framework proposes a set of perturbation operations that can be applied on the underlying data to generate test samples of different types. The perturbations reflect potential changes in operating environments, and interrogate various properties ranging from the strictly quantitative to more qualitative. Our framework is evaluated on multiple datasets like Tinyimagenet, CIFAR10, CIFAR100 and Camelyon17 and for models like ViT and Resnet. Our Vision Checklist proposes a specific set of evaluations that can be integrated into the previously proposed concept of a model card. Robustness evaluations like our checklist will be crucial in future safety evaluations of visual perception modules, and be useful for a wide range of stakeholders including designers, deployers, and regulators involved in the certification of these systems. Source code of Vision Checklist would be open for public use.

下载PDF全文

下载文献需遵守相关版权规定

论文标题