论文标题
分析和加强OpenWPM的可靠性
Analysing and strengthening OpenWPM's reliability
论文作者
论文摘要
自动浏览器被广泛用于大规模研究网络。他们的前提是他们测量了普通浏览器在网络上遇到的内容。实际上,发现由于自动化检测而导致的偏差。迄今为止,还没有详细研究自动浏览器以减少这种偏差的多大程度。在本文中,我们为特定的Web自动化框架进行了研究:OpenWPM,这是一个专门设计用于研究Web隐私的流行研究框架。我们分析了OpenWPM的可检测性,(2)OpenWPM检测的患病率和(3)OpenWPM数据记录的完整性。 我们的分析表明,OpenWPM很容易检测到。我们衡量基于指纹的检测已经在100,000个站点上对OpenWPM客户掌握了何种程度,并观察到通常被检测到(占头版的14%)。此外,我们在脚本中发现了集成的例程,以专门检测OpenWPM客户端。我们对OpenWPM数据记录完整性的调查确定了新型的逃避技术和以前针对OpenWPM仪器的攻击。我们调查并开发缓解以解决已确定的问题。总之,我们发现不应将自动化框架的可靠性视为理所当然。应研究此类框架的可识别性,并进行缓解,以提高可靠性。
Automated browsers are widely used to study the web at scale. Their premise is that they measure what regular browsers would encounter on the web. In practice, deviations due to detection of automation have been found. To what extent automated browsers can be improved to reduce such deviations has so far not been investigated in detail. In this paper, we investigate this for a specific web automation framework: OpenWPM, a popular research framework specifically designed to study web privacy. We analyse (1) detectability of OpenWPM, (2) prevalence of OpenWPM detection, and (3) integrity of OpenWPM's data recording. Our analysis reveals OpenWPM is easily detectable. We measure to what extent fingerprint-based detection is already leveraged against OpenWPM clients on 100,000 sites and observe that it is commonly detected (~14% of front pages). Moreover, we discover integrated routines in scripts to specifically detect OpenWPM clients. Our investigation of OpenWPM's data recording integrity identifies novel evasion techniques and previously unknown attacks against OpenWPM's instrumentation. We investigate and develop mitigations to address the identified issues. In conclusion, we find that reliability of automation frameworks should not be taken for granted. Identifiability of such frameworks should be studied, and mitigations deployed, to improve reliability.