具有语言模型的目标蜂蜜文字生成

论文标题

具有语言模型的目标蜂蜜文字生成

Targeted Honeyword Generation with Language Models

论文作者

Yu, Fangyi, Martin, Miguel Vargas

论文摘要

HoneyWords是插入数据库中的虚拟密码，以识别密码漏洞。主要困难是如何生产难以区分实际密码的蜜字。尽管过去已经广泛研究了蜂蜜词的产生，但大多数现有研究都认为攻击者对用户不了解。如果攻击者利用用户的个人身份信息（PII），并且实际密码包括用户的PII，则这些HoneyWord生成技术（HGT）可能会完全失败。在本文中，我们建议建立一个更安全，更可信赖的身份验证系统，该系统采用现成的预训练的语言模型，不需要对真实密码进行进一步的培训以生成蜂蜜词，同时保留了相关的真实密码的PII，因此大大提高了攻击者的标准。我们进行了一项试点实验，要求个人在为GPT-3提供用户名和调整技术时区分真实的密码和Honeywords。结果表明，对于这两种技术，很难将真实密码与人工密码区分开。我们推测，较大的样本量可以揭示两种HGT技术之间的显着差异，这有利于我们提出的方法。

Honeywords are fictitious passwords inserted into databases in order to identify password breaches. The major difficulty is how to produce honeywords that are difficult to distinguish from real passwords. Although the generation of honeywords has been widely investigated in the past, the majority of existing research assumes attackers have no knowledge of the users. These honeyword generating techniques (HGTs) may utterly fail if attackers exploit users' personally identifiable information (PII) and the real passwords include users' PII. In this paper, we propose to build a more secure and trustworthy authentication system that employs off-the-shelf pre-trained language models which require no further training on real passwords to produce honeywords while retaining the PII of the associated real password, therefore significantly raising the bar for attackers. We conducted a pilot experiment in which individuals are asked to distinguish between authentic passwords and honeywords when the username is provided for GPT-3 and a tweaking technique. Results show that it is extremely difficult to distinguish the real passwords from the artifical ones for both techniques. We speculate that a larger sample size could reveal a significant difference between the two HGT techniques, favouring our proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题