DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS
Along with the development of information technology, various threats in cyberspace are getting bigger, and these threats ignore time and conventional state boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the government sector, national critical infrastructure, and universit...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/80321 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:80321 |
---|---|
spelling |
id-itb.:803212024-01-22T10:09:33ZDESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS Cahyo Nugroho, Firdaus Indonesia Theses malware, machine learning, naïve bayes, ensemble technique. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/80321 Along with the development of information technology, various threats in cyberspace are getting bigger, and these threats ignore time and conventional state boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the government sector, national critical infrastructure, and universities throughout 2022, there were 818,192 malware attacks found. Meanwhile, based on statistical data released by the AV-TEST Institute, in the last 5 years, there has been a large increase in the number of malware infections; from January to August 2023, there were more than 1,068 million malware infections globally. This statistical data shows that the threat of malware cannot be underestimated, given its potential to compromise sensitive data, obliterate information, and disrupt operations on a substantial scale. Therefore, it is essential to keep conducting research on the detection and classification of malware to combat this threat. Malware detection is the process of identifying and classifying malicious software (malware) that can harm computer systems or other devices. Machine learning often serves as an effective solution for detecting malware. However, the implementation of machine learning in malware detection still presents many challenges. One crucial challenge is the selection of machine learning algorithms with high performance in detecting malicious software. Naïve Bayes is one of the machine learning algorithms that can be used for malware detection and classification. In Naïve Bayes, the term "naïve" refers to the assumption that all predictor variables (features) are independent of the values of other features in a specific class variable. By considering each feature as independent, the predictive performance of the Naïve Bayes classifier can be negatively affected by the presence of excessive attributes and dependencies in the training data. There are various ways to improve the performance of the Naïve Bayes classifier, such as removing correlated features using conditional independence, using Weighted Principal Component Analysis, combining feature weighting with Laplace calibration, using Chi-Square as a feature selection method with Laplace Smoothing, and combining the Naïve Bayes algorithm with other algorithms using ensemble bagging, voting, and stacking method. In this research, the author proposea detection system that employs different variations of Naïve Bayes algorithms, which have been improved specifically to iv classify malware. The experiment was conducted using the training dataset from the ember dataset and the testing dataset from Honeynet BSSN. The experimental results demonstrate an improvement in accuracy levels, with the base model Naïve Bayes algorithm achieving an accuracy of 50%, while the highest accuracy was observed in the Ensemble Stacking method, which combines Naïve Bayes, KNN, and Random Forest, achieving an accuracy rate of 95.1%. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Along with the development of information technology, various threats in
cyberspace are getting bigger, and these threats ignore time and conventional state
boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the
government sector, national critical infrastructure, and universities throughout
2022, there were 818,192 malware attacks found. Meanwhile, based on statistical
data released by the AV-TEST Institute, in the last 5 years, there has been a large
increase in the number of malware infections; from January to August 2023, there
were more than 1,068 million malware infections globally. This statistical data
shows that the threat of malware cannot be underestimated, given its potential to
compromise sensitive data, obliterate information, and disrupt operations on a
substantial scale. Therefore, it is essential to keep conducting research on the
detection and classification of malware to combat this threat.
Malware detection is the process of identifying and classifying malicious software
(malware) that can harm computer systems or other devices. Machine learning
often serves as an effective solution for detecting malware. However, the
implementation of machine learning in malware detection still presents many
challenges. One crucial challenge is the selection of machine learning algorithms
with high performance in detecting malicious software. Naïve Bayes is one of the
machine learning algorithms that can be used for malware detection and
classification. In Naïve Bayes, the term "naïve" refers to the assumption that all
predictor variables (features) are independent of the values of other features in a
specific class variable. By considering each feature as independent, the predictive
performance of the Naïve Bayes classifier can be negatively affected by the
presence of excessive attributes and dependencies in the training data. There are
various ways to improve the performance of the Naïve Bayes classifier, such as
removing correlated features using conditional independence, using Weighted
Principal Component Analysis, combining feature weighting with Laplace
calibration, using Chi-Square as a feature selection method with Laplace
Smoothing, and combining the Naïve Bayes algorithm with other algorithms using
ensemble bagging, voting, and stacking method.
In this research, the author proposea detection system that employs different
variations of Naïve Bayes algorithms, which have been improved specifically to
iv
classify malware. The experiment was conducted using the training dataset from
the ember dataset and the testing dataset from Honeynet BSSN. The experimental
results demonstrate an improvement in accuracy levels, with the base model Naïve
Bayes algorithm achieving an accuracy of 50%, while the highest accuracy was
observed in the Ensemble Stacking method, which combines Naïve Bayes, KNN,
and Random Forest, achieving an accuracy rate of 95.1%. |
format |
Theses |
author |
Cahyo Nugroho, Firdaus |
spellingShingle |
Cahyo Nugroho, Firdaus DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS |
author_facet |
Cahyo Nugroho, Firdaus |
author_sort |
Cahyo Nugroho, Firdaus |
title |
DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS |
title_short |
DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS |
title_full |
DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS |
title_fullStr |
DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS |
title_full_unstemmed |
DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS |
title_sort |
design and implementing a malware detection system using naãve bayes variants |
url |
https://digilib.itb.ac.id/gdl/view/80321 |
_version_ |
1822281583639396352 |