信息管理中的语义保留的扭曲以保护个人隐私保护

论文标题

信息管理中的语义保留的扭曲以保护个人隐私保护

Semantics-Preserved Distortion for Personal Privacy Protection in Information Management

论文作者

Li, Jiajia, Yang, Lu, Peng, Letian, Zhang, Shitou, Wang, Ping, Li, Zuchao, Zhao, Hai

论文摘要

近年来，机器学习 - 尤其是深度学习 - 严重影响了信息管理领域。尽管已经提出了几种策略来限制模型从原始文本中学习和记住敏感信息，但本文提出了一种更语言的方法来扭曲文本，同时保持语义完整性。为此，我们利用了相邻的分布差异，这是一种新颖的指标来评估失真过程中语义含义的保存。在此指标的基础上，我们提出了两个不同的框架，用于语义保护失真：一种生成方法和一种替代方法。我们在各种任务中进行的评估，包括指定的实体识别，选区解析和机器阅读理解，确认我们失真技术在个人隐私保护方面的合理性和功效。我们还在NLP域内的三个以隐私为中心的分配中对属性攻击进行了测试，这些发现强调了我们基于数据的改进方法对结构改进方法的简单性和功效。此外，我们在特定的医疗信息管理方案中探索隐私保护，显示我们的方法有效地限制了敏感的数据记忆，并强调了其实用性。

In recent years, machine learning - particularly deep learning - has significantly impacted the field of information management. While several strategies have been proposed to restrict models from learning and memorizing sensitive information from raw texts, this paper suggests a more linguistically-grounded approach to distort texts while maintaining semantic integrity. To this end, we leverage Neighboring Distribution Divergence, a novel metric to assess the preservation of semantic meaning during distortion. Building on this metric, we present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. Our evaluations across various tasks, including named entity recognition, constituency parsing, and machine reading comprehension, affirm the plausibility and efficacy of our distortion technique in personal privacy protection. We also test our method against attribute attacks in three privacy-focused assignments within the NLP domain, and the findings underscore the simplicity and efficacy of our data-based improvement approach over structural improvement approaches. Moreover, we explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization, underscoring its practicality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题