An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding

Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mohammad Shamsul Hoque, Norziana Jamil, Nowshad Amin, Lam, Kwok-Yan
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2022
Subjects:	Engineering::Computer science and engineering Cloud Security Management Supervised Machine Learning
Online Access:	https://hdl.handle.net/10356/153900
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-153900
record_format	dspace
spelling	sg-ntu-dr.10356-1539002022-06-04T20:11:15Z An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding Mohammad Shamsul Hoque Norziana Jamil Nowshad Amin Lam, Kwok-Yan School of Computer Science and Engineering Nanyang Technopreneurship Center Engineering::Computer science and engineering Cloud Security Management Supervised Machine Learning Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby there exists a severe class imbalance between the number of exploited and non-exploited vulnerabilities. The open source national vulnerability database, the largest repository to index and maintain all known vulnerabilities, assigns a unique identifier to each vulnerability. Each registered vulnerability also gets a severity score based on the impact it might inflict upon if compromised. Recent research works showed that the cvss score is not the only factor to select a vulnerability for exploitation, and other attributes in the national vulnerability database can be effectively utilized as predictive feature to predict the most exploitable vulnerabilities. Since cybersecurity management is highly resource savvy, organizations such as cloud systems will benefit when the most likely exploitable vulnerabilities that exist in their system software or hardware can be predicted with as much accuracy and reliability as possible, to best utilize the available resources to fix those first. Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but still suffer greatly from the overfitting problem to the major class rendering them practically unreliable. In this research, we have designed a novel cost function feature to address the existing class imbalance. We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded layer in neural networks. Our developed vulnerability exploitation prediction models powered by a novel cost function and custom-trained word vector have achieved very high overall performance metrics for accuracy, precision, recall, F1-Score and AUC score with values of 0.92, 0.89, 0.98, 0.94 and 0.97, respectively, thereby outperforming any existing models while successfully overcoming the existing overfitting problem for class imbalance. Published version This research is supported by BOLD Publication Fund 2021, Yayasan Canselor Uniten (YCU) Grant with a project code RJ010517844/06, and partly supported by TNB Seed Fund 2019-2020 with a project code U-TC-RD-19-09. We also thank ICT Ministry, Bangladesh for its support. 2022-06-03T05:01:57Z 2022-06-03T05:01:57Z 2021 Journal Article Mohammad Shamsul Hoque, Norziana Jamil, Nowshad Amin & Lam, K. (2021). An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding. Sensors, 21(12), 4220-. https://dx.doi.org/10.3390/s21124220 1424-8220 https://hdl.handle.net/10356/153900 10.3390/s21124220 34202977 2-s2.0-85108114510 12 21 4220 en Sensors © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Cloud Security Management Supervised Machine Learning
spellingShingle	Engineering::Computer science and engineering Cloud Security Management Supervised Machine Learning Mohammad Shamsul Hoque Norziana Jamil Nowshad Amin Lam, Kwok-Yan An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
description	Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby there exists a severe class imbalance between the number of exploited and non-exploited vulnerabilities. The open source national vulnerability database, the largest repository to index and maintain all known vulnerabilities, assigns a unique identifier to each vulnerability. Each registered vulnerability also gets a severity score based on the impact it might inflict upon if compromised. Recent research works showed that the cvss score is not the only factor to select a vulnerability for exploitation, and other attributes in the national vulnerability database can be effectively utilized as predictive feature to predict the most exploitable vulnerabilities. Since cybersecurity management is highly resource savvy, organizations such as cloud systems will benefit when the most likely exploitable vulnerabilities that exist in their system software or hardware can be predicted with as much accuracy and reliability as possible, to best utilize the available resources to fix those first. Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but still suffer greatly from the overfitting problem to the major class rendering them practically unreliable. In this research, we have designed a novel cost function feature to address the existing class imbalance. We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded layer in neural networks. Our developed vulnerability exploitation prediction models powered by a novel cost function and custom-trained word vector have achieved very high overall performance metrics for accuracy, precision, recall, F1-Score and AUC score with values of 0.92, 0.89, 0.98, 0.94 and 0.97, respectively, thereby outperforming any existing models while successfully overcoming the existing overfitting problem for class imbalance.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Mohammad Shamsul Hoque Norziana Jamil Nowshad Amin Lam, Kwok-Yan
format	Article
author	Mohammad Shamsul Hoque Norziana Jamil Nowshad Amin Lam, Kwok-Yan
author_sort	Mohammad Shamsul Hoque
title	An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_short	An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_full	An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_fullStr	An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_full_unstemmed	An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_sort	improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
publishDate	2022
url	https://hdl.handle.net/10356/153900
_version_	1735491216764567552

An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding

Similar Items