论文标题

使用约束解释可解释数据

Explainable Data Imputation using Constraints

论文作者

Hans, Sandeep, Saha, Diptikalyan, Aggarwal, Aniya

论文摘要

由于不当或人为错误,数据集中的数据值可能丢失或异常。分析缺少值的数据会产生偏见并影响推论。几种分析方法,例如原理组件分析或单数值分解,都需要完整的数据。许多方法估算数字数据,有些方法不考虑属性对其他属性的依赖性,而有些则需要人类干预和域知识。我们提出了一种基于不同数据类型值及其在数据中的关联限制的数据推出的新算法,该算法目前尚未由任何系统处理。我们使用不同的指标显示了实验结果,将我们的算法与最先进的归合技术进行了比较。我们的算法不仅强化了缺失的值,而且还产生了人类可读的解释,描述了每个插补的属性的意义。

Data values in a dataset can be missing or anomalous due to mishandling or human error. Analysing data with missing values can create bias and affect the inferences. Several analysis methods, such as principle components analysis or singular value decomposition, require complete data. Many approaches impute numeric data and some do not consider dependency of attributes on other attributes, while some require human intervention and domain knowledge. We present a new algorithm for data imputation based on different data type values and their association constraints in data, which are not handled currently by any system. We show experimental results using different metrics comparing our algorithm with state of the art imputation techniques. Our algorithm not only imputes the missing values but also generates human readable explanations describing the significance of attributes used for every imputation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源