A Kullback-Liebler divergence-based representation algorithm for malware detection

Background. Malware, malicious software, is the major security concern of the digital realm. Conventional cyber-security solutions are challenged by sophisticated malicious behaviors. Currently, an overlap between malicious and legitimate behaviors causes more difficulties in characterizing those be...

Full description

Saved in:

Bibliographic Details
Main Authors:	Aboaoja, Faitouri A., Zainal, Anazida, Ghaleb, Fuad A., Saleh Alghamdi, Norah, Saeed, Faisal, Alhuwayji, Husayn
Format:	Article
Language:	English
Published:	PeerJ Inc. 2023
Subjects:	QA Mathematics
Online Access:	http://eprints.utm.my/107630/1/FaitouriAAboaoja2023_AKullbackLieblerDivergencebasedRepresentation.pdf http://eprints.utm.my/107630/ http://dx.doi.org/10.7717/peerj-cs.1492
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Teknologi Malaysia
Language:	English

id	my.utm.107630
record_format	eprints
spelling	my.utm.1076302024-09-25T06:57:32Z http://eprints.utm.my/107630/ A Kullback-Liebler divergence-based representation algorithm for malware detection Aboaoja, Faitouri A. Zainal, Anazida Ghaleb, Fuad A. Saleh Alghamdi, Norah Saeed, Faisal Alhuwayji, Husayn QA Mathematics Background. Malware, malicious software, is the major security concern of the digital realm. Conventional cyber-security solutions are challenged by sophisticated malicious behaviors. Currently, an overlap between malicious and legitimate behaviors causes more difficulties in characterizing those behaviors as malicious or legitimate activities. For instance, evasive malware often mimics legitimate behaviors, and evasion techniques are utilized by legitimate and malicious software problem. Most of the existing solutions use the traditional term of frequency-inverse document frequency (TF-IDF) technique or its concept to represent malware behaviors. However, the traditional TF-IDF and the developed techniques represent the features, especially the shared ones, inaccurately because those techniques calculate a weight for each feature without considering its distribution in each class; instead, the generated weight is generated based on the distribution of the feature among all the documents. Such presumption can reduce the meaning of those features, and when those features are used to classify malware, they lead to a high false alarms method. This study proposes a Kullback-Liebler Divergence-based Term Frequency Probability Class Distribution (KLD-based TF-PCD) algorithm to represent the extracted features based on the differences between the probability distributions of the terms in malware and benign classes. Unlike the existing solution, the proposed algorithm increases the weights of the important features by using the Kullback-Liebler Divergence tool to measure the differences between their probability distributions in malware and benign classes results. The experimental results show that the proposed KLD-based TF-PCD algorithm achieved an accuracy of 0.972, the false positive rate of 0.037, and the F-measure of 0.978. Such results were significant compared to the related work studies. Thus, the proposed KLD-based TF-PCD algorithm contributes to improving the security of cyberspace conclusion. New meaningful characteristics have been added by the proposed algorithm to promote the learned knowledge of the classifiers, and thus increase their ability to classify malicious behaviors accurately. PeerJ Inc. 2023-09-22 Article PeerReviewed application/pdf en http://eprints.utm.my/107630/1/FaitouriAAboaoja2023_AKullbackLieblerDivergencebasedRepresentation.pdf Aboaoja, Faitouri A. and Zainal, Anazida and Ghaleb, Fuad A. and Saleh Alghamdi, Norah and Saeed, Faisal and Alhuwayji, Husayn (2023) A Kullback-Liebler divergence-based representation algorithm for malware detection. PeerJ Computer Science, 9 (NA). pp. 1-29. ISSN 2376-5992 http://dx.doi.org/10.7717/peerj-cs.1492 DOI:10.7717/peerj-cs.1492
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
language	English
topic	QA Mathematics
spellingShingle	QA Mathematics Aboaoja, Faitouri A. Zainal, Anazida Ghaleb, Fuad A. Saleh Alghamdi, Norah Saeed, Faisal Alhuwayji, Husayn A Kullback-Liebler divergence-based representation algorithm for malware detection
description	Background. Malware, malicious software, is the major security concern of the digital realm. Conventional cyber-security solutions are challenged by sophisticated malicious behaviors. Currently, an overlap between malicious and legitimate behaviors causes more difficulties in characterizing those behaviors as malicious or legitimate activities. For instance, evasive malware often mimics legitimate behaviors, and evasion techniques are utilized by legitimate and malicious software problem. Most of the existing solutions use the traditional term of frequency-inverse document frequency (TF-IDF) technique or its concept to represent malware behaviors. However, the traditional TF-IDF and the developed techniques represent the features, especially the shared ones, inaccurately because those techniques calculate a weight for each feature without considering its distribution in each class; instead, the generated weight is generated based on the distribution of the feature among all the documents. Such presumption can reduce the meaning of those features, and when those features are used to classify malware, they lead to a high false alarms method. This study proposes a Kullback-Liebler Divergence-based Term Frequency Probability Class Distribution (KLD-based TF-PCD) algorithm to represent the extracted features based on the differences between the probability distributions of the terms in malware and benign classes. Unlike the existing solution, the proposed algorithm increases the weights of the important features by using the Kullback-Liebler Divergence tool to measure the differences between their probability distributions in malware and benign classes results. The experimental results show that the proposed KLD-based TF-PCD algorithm achieved an accuracy of 0.972, the false positive rate of 0.037, and the F-measure of 0.978. Such results were significant compared to the related work studies. Thus, the proposed KLD-based TF-PCD algorithm contributes to improving the security of cyberspace conclusion. New meaningful characteristics have been added by the proposed algorithm to promote the learned knowledge of the classifiers, and thus increase their ability to classify malicious behaviors accurately.
format	Article
author	Aboaoja, Faitouri A. Zainal, Anazida Ghaleb, Fuad A. Saleh Alghamdi, Norah Saeed, Faisal Alhuwayji, Husayn
author_facet	Aboaoja, Faitouri A. Zainal, Anazida Ghaleb, Fuad A. Saleh Alghamdi, Norah Saeed, Faisal Alhuwayji, Husayn
author_sort	Aboaoja, Faitouri A.
title	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_short	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_full	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_fullStr	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_full_unstemmed	A Kullback-Liebler divergence-based representation algorithm for malware detection
title_sort	kullback-liebler divergence-based representation algorithm for malware detection
publisher	PeerJ Inc.
publishDate	2023
url	http://eprints.utm.my/107630/1/FaitouriAAboaoja2023_AKullbackLieblerDivergencebasedRepresentation.pdf http://eprints.utm.my/107630/ http://dx.doi.org/10.7717/peerj-cs.1492
_version_	1811681236337295360

A Kullback-Liebler divergence-based representation algorithm for malware detection

Similar Items