论文标题

检测网站网站而无需访问它们

Detecting Phishing sites Without Visiting them

论文作者

Pagadala, Kalaharsha

论文摘要

如今,网络攻击以前所未有的速度增加。网络钓鱼是一种社会工程攻击,具有巨大的全球影响,破坏了公司,政府部门和个人的财务和经济价值。在网络钓鱼中,攻击者窃取用户个人信息,例如用户名,密码,借记卡信息等。为了检测零时的攻击并保护最终用户免受这些攻击的侵害,开发了各种反向钓鱼技术,但是最终用户必须访问网站以了解它们是否安全,这可能会导致他们的系统感染。在本文中,我们提出了一种方法,最终用户可以在不访问网站的情况下检测到网站的真实性。提出的方法收集合法和网络钓鱼URL,并从中提取特征。提取的功能作为六个不同分类器的输入,用于训练和构建模型。所使用的分类器是幼稚的,逻辑回归,随机森林,catboost,Xgboost和多层感知器。通过开发为扩展名来测试该方法,以便最终用户在浏览时可以使用它。在浏览器扩展程序中,当用户将光标访问任何链接时,出现弹出窗口显示网站的性质,即安全网站或欺骗性网站,然后出现一个确认框,询问用户是否要访问。使用由2000个网络钓鱼和合法网站URL组成的数据集测试该方法的性能,该方法能够在很少的时间内正确检测到网站。选择随机孔用于构建模型,因为它的精度最高为95%。

Now-a-days, cyberattacks are increasing at an unprecedented rate. Phishing is a social engineering attack which has a massive global impact, destroying the financial and economic value of corporations, government sectors and individuals. In phishing, attackers steal users personal information such as username, passwords, debit card information and so on. In order to detect zero-hour attacks and protect end-users from these attacks, various anti-phishing techniques are developed, but the end-users have to visit the websites to know whether they are safe or not, which may lead to infecting their system. In this paper, we propose a method where end-users can detect the genuineness of the sites without visiting them. The proposed method collects legitimate and phishing URLs and extract features from them. The extracted features are given as input to six different classifiers for training and constructing the model. The classifiers used are Naive-Bayes, Logistic Regression, Random Forest,CatBoost, XGBoost and Multilayer perceptron. The method is tested by developing into an extension so that the end-users can use it when browsing. In the browser extension when the user takes the cursor over any link, a pop-up appears showing the nature of the website i.e., safe site or deceptive site and then a confirm box shows up asking the user whether they want to visit or not. The performance of the approach is tested using a dataset consisting of 2000 phishing and legitimate website URLs and the method is able to detect the sites correctly in very little time. Random-Forest is chosen for constructing the model as it gives the highest accuracy of 95%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源