论文标题

IPPROTECT:在数据评估期间保护视觉数据集的知识产权

IPProtect: protecting the intellectual property of visual datasets during data valuation

论文作者

Singh, Gursimran, Wang, Chendi, Tazwar, Ahnaf, Wang, Lanjun, Zhang, Yong

论文摘要

数据交易对于加速数据驱动的机器学习管道的开发至关重要。数据交易中的主要问题是估计卖方数据集在给定买方的机器学习任务(也称为数据评估)方面的实用性。通常,数据评估要求一个或多个参与者与他人共享其RAW数据集,从而导致违反知识产权(IP)的潜在风险。在本文中,我们解决了预先保护数据评估期间需要共享数据集IP的新颖任务。首先,我们在视觉数据集中识别并形式化了两种新型IP风险:数据项目(图像)IP和统计(数据集)IP。然后,我们提出了一种新颖的算法,以将原始数据集转换为消毒版本,该版本可对IP违规行为产生阻力,同时允许准确的数据估值。关键思想是将信息从RAW数据集转移到消毒数据集,从而防止潜在的知识产权违规。接下来,我们分析我们的方法可能存在解决方案和免疫力,以防止重建攻击。最后,我们在三个计算机视觉数据集上进行了广泛的实验,证明了与其他基线相比,我们方法的优势。

Data trading is essential to accelerate the development of data-driven machine learning pipelines. The central problem in data trading is to estimate the utility of a seller's dataset with respect to a given buyer's machine learning task, also known as data valuation. Typically, data valuation requires one or more participants to share their raw dataset with others, leading to potential risks of intellectual property (IP) violations. In this paper, we tackle the novel task of preemptively protecting the IP of datasets that need to be shared during data valuation. First, we identify and formalize two kinds of novel IP risks in visual datasets: data-item (image) IP and statistical (dataset) IP. Then, we propose a novel algorithm to convert the raw dataset into a sanitized version, that provides resistance to IP violations, while at the same time allowing accurate data valuation. The key idea is to limit the transfer of information from the raw dataset to the sanitized dataset, thereby protecting against potential intellectual property violations. Next, we analyze our method for the likely existence of a solution and immunity against reconstruction attacks. Finally, we conduct extensive experiments on three computer vision datasets demonstrating the advantages of our method in comparison to other baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源