论文标题

关于美国人口普查重新识别攻击的误解的注释

A Note on the Misinterpretation of the US Census Re-identification Attack

论文作者

Francis, Paul

论文摘要

2018年,美国人口普查局设计了一项新的数据重建和重新识别攻击,并针对其2010年数据发布进行了测试。该局执行的特定攻击允许攻击者推断出85%的受访者的受访者的种族和种族平均为75%,假设攻击者知道受访者的正确年龄,性别和地址。他们将袭击解释为超出了该局的隐私标准,因此以上层算法(TDA)的形式引入了2020年人口普查的更强大的隐私保护。本文表明,可以从受TDA保护的人口普查数据中推断出种族和种族,其精确度和回忆性更高,使用较少的先验知识:只有受访者的地址。在98%的受访者中,可以推断出种族和种族,平均75%的精度,可以以11%的受访者为100%的精度来推断。简单地假设受访者的种族/民族是受访者的人口普查障碍的大多数种族/民族的种族/民族。从这个简单的演示中得出的结论并不是,该局的数据发布缺乏足够的隐私保护。确实,这是数据发布的目的,允许这种推论。相反,问题在于,无线电通信局的衡量隐私标准是有缺陷和过于悲观的。

In 2018, the US Census Bureau designed a new data reconstruction and re-identification attack and tested it against their 2010 data release. The specific attack executed by the Bureau allows an attacker to infer the race and ethnicity of respondents with average 75% precision for 85% of the respondents, assuming that the attacker knows the correct age, sex, and address of the respondents. They interpreted the attack as exceeding the Bureau's privacy standards, and so introduced stronger privacy protections for the 2020 Census in the form of the TopDown Algorithm (TDA). This paper demonstrates that race and ethnicity can be inferred from the TDA-protected census data with substantially better precision and recall, using less prior knowledge: only the respondents' address. Race and ethnicity can be inferred with average 75% precision for 98% of the respondents, and can be inferred with 100% precision for 11% of the respondents. The inference is done by simply assuming that the race/ethnicity of the respondent is that of the majority race/ethnicity for the respondent's census block. The conclusion to draw from this simple demonstration is NOT that the Bureau's data releases lack adequate privacy protections. Indeed it is the purpose of the data releases to allow this kind of inference. The problem, rather, is that the Bureau's criteria for measuring privacy is flawed and overly pessimistic.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源