论文标题
基于AI的行为点击屏数据的重新识别
AI-based Re-identification of Behavioral Clickstream Data
论文作者
论文摘要
基于AI的面部识别,即对图像中的个人的重新识别,是一种已经建立的视频监视技术,用于用户身份验证,用于标记朋友的照片等。本文表明,可以根据其行为模式成功地将类似的技术应用于成功地重新识别个人。与基于记录链接的De-Nonymination攻击相反,这些方法不需要在已发布的数据集和已识别的辅助数据集之间的数据点重叠。记录之间的行为模式仅相似,足以将行为数据正确归因于已确定的个人。此外,我们可以证明数据扰动不能提供保护,除非大量的数据实用程序被破坏。这些发现要求当与第三方共享实际行为数据时,要提出真诚的警告,因为像GDPR这样的现代隐私法规,根据重新识别的能力来定义其范围。在处理潜在的可重新识别数据源(例如购物行为,点击流数据或coccies)时,这对营销领域也具有很大的影响。我们还展示了合成数据如何提供可行的替代方案,这证明对我们引入的基于AI的重新识别攻击具有弹性。
AI-based face recognition, i.e., the re-identification of individuals within images, is an already well established technology for video surveillance, for user authentication, for tagging photos of friends, etc. This paper demonstrates that similar techniques can be applied to successfully re-identify individuals purely based on their behavioral patterns. In contrast to de-anonymization attacks based on record linkage, these methods do not require any overlap in data points between a released dataset and an identified auxiliary dataset. The mere resemblance of behavioral patterns between records is sufficient to correctly attribute behavioral data to identified individuals. Further, we can demonstrate that data perturbation does not provide protection, unless a significant share of data utility is being destroyed. These findings call for sincere cautions when sharing actual behavioral data with third parties, as modern-day privacy regulations, like the GDPR, define their scope based on the ability to re-identify. This has also strong implications for the Marketing domain, when dealing with potentially re-identify-able data sources like shopping behavior, clickstream data or cockies. We also demonstrate how synthetic data can offer a viable alternative, that is shown to be resilient against our introduced AI-based re-identification attacks.