A graph-theoretic approach for the detection of phishing webpages

Over the years, various technical means have been developed to protect Internet users from phishing attacks. To enrich the anti-phishing efforts, we capitalise on concepts from graph theories, and propose a set of novel graph features to improve the phishing detection accuracy. The initial phase of...

Full description

Saved in:
Bibliographic Details
Main Authors: Tan, Choon Lin, Chiew, Kang Leng, Yong, Kelvin S.C., Sze, San Nah, Abdullah, Johari, Sebastian, Yakub
Format: Article
Language:English
Published: Elsevier 2020
Subjects:
Online Access:http://ir.unimas.my/id/eprint/31278/1/A%20Graph-Theoretic%20-%20Copy.pdf
http://ir.unimas.my/id/eprint/31278/
https://www.sciencedirect.com/science/article/pii/S016740482030078X
https://doi.org/10.1016/j.cose.2020.101793
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Sarawak
Language: English
Description
Summary:Over the years, various technical means have been developed to protect Internet users from phishing attacks. To enrich the anti-phishing efforts, we capitalise on concepts from graph theories, and propose a set of novel graph features to improve the phishing detection accuracy. The initial phase of the proposed technique involved the extraction of hyperlinks in the webpage under scrutiny and fetching the corresponding neighbourhood webpages. During this process, the page linking data were collected, and used to construct a web graph which models the overall hyperlink and network structure of the webpage. From the web graph, graph measures were computed and extracted as graph features to derive a classifier for detecting phishing webpages. Experimental results show that the proposed graph features achieve an improved overall accuracy of 97.8% when C4.5 was utilised as classifier, outperforming the existing conventional features derived from the same data samples. Unlike conventional features, the proposed graph features leverage inherent phishing patterns that are only visible at a higher level of abstraction, thus making it robust and difficult to be evaded by direct manipulations on the webpage contents. Our proposed graph-based technique also shows promising results when benchmarked against a prominent phishing detection technique. Hence, the proposed technique is an important contribution to the existing anti-phishing research towards improving the detection performance.