论文标题
隐私感知数据清洁-AS-A-Service(扩展版本)
Privacy-Aware Data Cleaning-as-a-Service (Extended Version)
论文作者
论文摘要
对于组织试图从数据中获得价值时,数据清洁是一个普遍的问题。网络和云计算技术的最新进展推动了一种新的计算范式,称为数据库-AS-A-Service,其中数据管理任务被外包给大型服务提供商。在本文中,我们考虑了一个数据清洁,即服务模型,该模型允许客户与主持策划和敏感数据的数据清洁提供商进行交互。我们提出PACAS:一种隐私感知数据清洁AS-A-Service模型,可促进双方与客户查询请求数据的互动,以及使用数据定价方案使用数据敏感性计算价格的数据定价方案的服务提供商。我们向模型提出了新的扩展,以定义概括敏感数据以允许客户和服务提供商之间的数据共享。我们提出了一种新的语义距离度量,以量化此类维修的效用,并在存在广义值的情况下重新定义了一致性的概念。 PACAS模型使用(X,Y,L) - 匿名性,该模型扩展了现有的数据发布技术,以考虑数据中的语义,同时保护敏感值。我们对实际数据的评估表明,与现有的隐私清洁技术相比,PACAS保护语义相关的敏感值,并提供较低的维修错误。
Data cleaning is a pervasive problem for organizations as they try to reap value from their data. Recent advances in networking and cloud computing technology have fueled a new computing paradigm called Database-as-a-Service, where data management tasks are outsourced to large service providers. In this paper, we consider a Data Cleaning-as-a-Service model that allows a client to interact with a data cleaning provider who hosts curated, and sensitive data. We present PACAS: a Privacy-Aware data Cleaning-As-a-Service model that facilitates interaction between the parties with client query requests for data, and a service provider using a data pricing scheme that computes prices according to data sensitivity. We propose new extensions to the model to define generalized data repairs that obfuscate sensitive data to allow data sharing between the client and service provider. We present a new semantic distance measure to quantify the utility of such repairs, and we re-define the notion of consistency in the presence of generalized values. The PACAS model uses (X,Y,L)-anonymity that extends existing data publishing techniques to consider the semantics in the data while protecting sensitive values. Our evaluation over real data show that PACAS safeguards semantically related sensitive values, and provides lower repair errors compared to existing privacy-aware cleaning techniques.