通过挖掘可解释的证据来揭示不公平的模型

论文标题

通过挖掘可解释的证据来揭示不公平的模型

Revealing Unfair Models by Mining Interpretable Evidence

论文作者

Bajaj, Mohit, Chu, Lingyang, Romaniello, Vittorio, Singh, Gursimran, Pei, Jian, Zhou, Zirui, Wang, Lanjun, Zhang, Yong

论文摘要

机器学习的普及增加了不公平模型的风险，该模型被部署在高级应用程序中，例如司法系统，药物/疫苗接种设计和医学诊断。尽管有有效的方法可以从头开始训练公平模型，但如何自动揭示和解释受过训练的模型的不公平仍然是一项艰巨的任务。以可解释的方式揭示机器学习模型的不公平是迈向公平和值得信赖的AI的关键一步。在本文中，我们系统地解决了通过挖掘可解释的证据（Rumie）来揭示不公平模型的新任务。关键思想是以一组模型区分的数据实例的形式找到可靠的证据。为了使证据可以解释，我们还找到了一组人为理解的关键属性和决策规则，这些属性和决策规则表征了歧视的数据实例并将其与其他非歧视数据区分开来。正如对许多实际数据集的广泛实验所证明的那样，我们的方法找到了高度可解释且可靠的证据，可以有效揭示受过训练的模型的不公平性。此外，它比所有基线方法都更可扩展。

The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications, such as justice system, drug/vaccination design, and medical diagnosis. Although there are effective methods to train fair models from scratch, how to automatically reveal and explain the unfairness of a trained model remains a challenging task. Revealing unfairness of machine learning models in interpretable fashion is a critical step towards fair and trustworthy AI. In this paper, we systematically tackle the novel task of revealing unfair models by mining interpretable evidence (RUMIE). The key idea is to find solid evidence in the form of a group of data instances discriminated most by the model. To make the evidence interpretable, we also find a set of human-understandable key attributes and decision rules that characterize the discriminated data instances and distinguish them from the other non-discriminated data. As demonstrated by extensive experiments on many real-world data sets, our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models. Moreover, it is much more scalable than all of the baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题