论文标题
看到看不见的:视觉数据集中的错误和偏见
Seeing the Unseen: Errors and Bias in Visual Datasets
论文作者
论文摘要
从智能手机的面部识别到自动驾驶汽车的自动路由,机器视觉算法是这些功能的核心。这些系统通过识别和理解对象来解决基于图像的任务,然后从这些信息中做出决策。但是,数据集中的错误通常在算法中引起甚至放大,有时会导致诸如将黑人识别为大猩猩和搜索结果中的种族虚假陈述等问题。本文跟踪数据集及其影响的错误,表明存在缺陷的数据集可能是类别有限的结果,不可经常的采购和分类不佳的结果。
From face recognition in smartphones to automatic routing on self-driving cars, machine vision algorithms lie in the core of these features. These systems solve image based tasks by identifying and understanding objects, subsequently making decisions from these information. However, errors in datasets are usually induced or even magnified in algorithms, at times resulting in issues such as recognising black people as gorillas and misrepresenting ethnicities in search results. This paper tracks the errors in datasets and their impacts, revealing that a flawed dataset could be a result of limited categories, incomprehensive sourcing and poor classification.