确保用于机器学习认证的数据集质量

论文标题

确保用于机器学习认证的数据集质量

Ensuring Dataset Quality for Machine Learning Certification

论文作者

Picard, Sylvaine, Chapdelaine, Camille, Cappi, Cyril, Gardes, Laurent, Jenn, Eric, Lefèvre, Baptiste, Soumarmon, Thomas

论文摘要

在本文中，我们在基于机器学习（ML）的关键系统的背景下解决了数据集质量的问题。我们简要分析了一些与数据有关的现有标准的适用性，并表明ML上下文的特异性既没有正确捕获也不被捕获。作为对这种情况情况的第一个答案，我们提出了一个数据集规范和验证过程，并将其应用于铁路域的信号识别系统。另外，我们还提供了有关数据集收集和管理的建议列表。这项工作是朝着数据集工程过程迈出的一步，将ML用于安全关键系统所必需。

In this paper, we address the problem of dataset quality in the context of Machine Learning (ML)-based critical systems. We briefly analyse the applicability of some existing standards dealing with data and show that the specificities of the ML context are neither properly captured nor taken into ac-count. As a first answer to this concerning situation, we propose a dataset specification and verification process, and apply it on a signal recognition system from the railway domain. In addi-tion, we also give a list of recommendations for the collection and management of datasets. This work is one step towards the dataset engineering process that will be required for ML to be used on safety critical systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题