论文标题
数据验证
Data Validation
论文作者
论文摘要
数据验证是人们决定特定数据集是否适合给定目的的活动。正式化推动这一决策过程的要求允许明确交流要求,决策过程的自动化,并打开维护和调查决策过程本身的方法。本文的目的是形式化数据验证的定义,并演示可以从该定义得出的一些属性。特别是,它显示了该概念的形式观点如何允许数据质量需求的分类,从而使它们以越来越多的复杂程度订购。还指出了可能将许多此类要求结合起来引起的一些微妙之处。
Data validation is the activity where one decides whether or not a particular data set is fit for a given purpose. Formalizing the requirements that drive this decision process allows for unambiguous communication of the requirements, automation of the decision process, and opens up ways to maintain and investigate the decision process itself. The purpose of this article is to formalize the definition of data validation and to demonstrate some of the properties that can be derived from this definition. In particular, it is shown how a formal view of the concept permits a classification of data quality requirements, allowing them to be ordered in increasing levels of complexity. Some subtleties arising from combining possibly many such requirements are pointed out as well.