DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS

Along with the development of information technology, various threats in cyberspace are getting bigger, and these threats ignore time and conventional state boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the government sector, national critical infrastructure, and universit...

Full description

Saved in:

Bibliographic Details
Main Author:	Cahyo Nugroho, Firdaus
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/80321
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:80321
spelling	id-itb.:803212024-01-22T10:09:33ZDESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS Cahyo Nugroho, Firdaus Indonesia Theses malware, machine learning, naïve bayes, ensemble technique. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/80321 Along with the development of information technology, various threats in cyberspace are getting bigger, and these threats ignore time and conventional state boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the government sector, national critical infrastructure, and universities throughout 2022, there were 818,192 malware attacks found. Meanwhile, based on statistical data released by the AV-TEST Institute, in the last 5 years, there has been a large increase in the number of malware infections; from January to August 2023, there were more than 1,068 million malware infections globally. This statistical data shows that the threat of malware cannot be underestimated, given its potential to compromise sensitive data, obliterate information, and disrupt operations on a substantial scale. Therefore, it is essential to keep conducting research on the detection and classification of malware to combat this threat. Malware detection is the process of identifying and classifying malicious software (malware) that can harm computer systems or other devices. Machine learning often serves as an effective solution for detecting malware. However, the implementation of machine learning in malware detection still presents many challenges. One crucial challenge is the selection of machine learning algorithms with high performance in detecting malicious software. Naïve Bayes is one of the machine learning algorithms that can be used for malware detection and classification. In Naïve Bayes, the term "naïve" refers to the assumption that all predictor variables (features) are independent of the values of other features in a specific class variable. By considering each feature as independent, the predictive performance of the Naïve Bayes classifier can be negatively affected by the presence of excessive attributes and dependencies in the training data. There are various ways to improve the performance of the Naïve Bayes classifier, such as removing correlated features using conditional independence, using Weighted Principal Component Analysis, combining feature weighting with Laplace calibration, using Chi-Square as a feature selection method with Laplace Smoothing, and combining the Naïve Bayes algorithm with other algorithms using ensemble bagging, voting, and stacking method. In this research, the author proposea detection system that employs different variations of Naïve Bayes algorithms, which have been improved specifically to iv classify malware. The experiment was conducted using the training dataset from the ember dataset and the testing dataset from Honeynet BSSN. The experimental results demonstrate an improvement in accuracy levels, with the base model Naïve Bayes algorithm achieving an accuracy of 50%, while the highest accuracy was observed in the Ensemble Stacking method, which combines Naïve Bayes, KNN, and Random Forest, achieving an accuracy rate of 95.1%. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Along with the development of information technology, various threats in cyberspace are getting bigger, and these threats ignore time and conventional state boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the government sector, national critical infrastructure, and universities throughout 2022, there were 818,192 malware attacks found. Meanwhile, based on statistical data released by the AV-TEST Institute, in the last 5 years, there has been a large increase in the number of malware infections; from January to August 2023, there were more than 1,068 million malware infections globally. This statistical data shows that the threat of malware cannot be underestimated, given its potential to compromise sensitive data, obliterate information, and disrupt operations on a substantial scale. Therefore, it is essential to keep conducting research on the detection and classification of malware to combat this threat. Malware detection is the process of identifying and classifying malicious software (malware) that can harm computer systems or other devices. Machine learning often serves as an effective solution for detecting malware. However, the implementation of machine learning in malware detection still presents many challenges. One crucial challenge is the selection of machine learning algorithms with high performance in detecting malicious software. Naïve Bayes is one of the machine learning algorithms that can be used for malware detection and classification. In Naïve Bayes, the term "naïve" refers to the assumption that all predictor variables (features) are independent of the values of other features in a specific class variable. By considering each feature as independent, the predictive performance of the Naïve Bayes classifier can be negatively affected by the presence of excessive attributes and dependencies in the training data. There are various ways to improve the performance of the Naïve Bayes classifier, such as removing correlated features using conditional independence, using Weighted Principal Component Analysis, combining feature weighting with Laplace calibration, using Chi-Square as a feature selection method with Laplace Smoothing, and combining the Naïve Bayes algorithm with other algorithms using ensemble bagging, voting, and stacking method. In this research, the author proposea detection system that employs different variations of Naïve Bayes algorithms, which have been improved specifically to iv classify malware. The experiment was conducted using the training dataset from the ember dataset and the testing dataset from Honeynet BSSN. The experimental results demonstrate an improvement in accuracy levels, with the base model Naïve Bayes algorithm achieving an accuracy of 50%, while the highest accuracy was observed in the Ensemble Stacking method, which combines Naïve Bayes, KNN, and Random Forest, achieving an accuracy rate of 95.1%.
format	Theses
author	Cahyo Nugroho, Firdaus
spellingShingle	Cahyo Nugroho, Firdaus DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS
author_facet	Cahyo Nugroho, Firdaus
author_sort	Cahyo Nugroho, Firdaus
title	DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS
title_short	DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS
title_full	DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS
title_fullStr	DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS
title_full_unstemmed	DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS
title_sort	design and implementing a malware detection system using naãve bayes variants
url	https://digilib.itb.ac.id/gdl/view/80321
_version_	1823655062403022848

DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS

Similar Items

DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS