DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÃVE BAYES VARIANTS
Along with the development of information technology, various threats in cyberspace are getting bigger, and these threats ignore time and conventional state boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the government sector, national critical infrastructure, and universit...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/80321 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Along with the development of information technology, various threats in
cyberspace are getting bigger, and these threats ignore time and conventional state
boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the
government sector, national critical infrastructure, and universities throughout
2022, there were 818,192 malware attacks found. Meanwhile, based on statistical
data released by the AV-TEST Institute, in the last 5 years, there has been a large
increase in the number of malware infections; from January to August 2023, there
were more than 1,068 million malware infections globally. This statistical data
shows that the threat of malware cannot be underestimated, given its potential to
compromise sensitive data, obliterate information, and disrupt operations on a
substantial scale. Therefore, it is essential to keep conducting research on the
detection and classification of malware to combat this threat.
Malware detection is the process of identifying and classifying malicious software
(malware) that can harm computer systems or other devices. Machine learning
often serves as an effective solution for detecting malware. However, the
implementation of machine learning in malware detection still presents many
challenges. One crucial challenge is the selection of machine learning algorithms
with high performance in detecting malicious software. Naïve Bayes is one of the
machine learning algorithms that can be used for malware detection and
classification. In Naïve Bayes, the term "naïve" refers to the assumption that all
predictor variables (features) are independent of the values of other features in a
specific class variable. By considering each feature as independent, the predictive
performance of the Naïve Bayes classifier can be negatively affected by the
presence of excessive attributes and dependencies in the training data. There are
various ways to improve the performance of the Naïve Bayes classifier, such as
removing correlated features using conditional independence, using Weighted
Principal Component Analysis, combining feature weighting with Laplace
calibration, using Chi-Square as a feature selection method with Laplace
Smoothing, and combining the Naïve Bayes algorithm with other algorithms using
ensemble bagging, voting, and stacking method.
In this research, the author proposea detection system that employs different
variations of Naïve Bayes algorithms, which have been improved specifically to
iv
classify malware. The experiment was conducted using the training dataset from
the ember dataset and the testing dataset from Honeynet BSSN. The experimental
results demonstrate an improvement in accuracy levels, with the base model Naïve
Bayes algorithm achieving an accuracy of 50%, while the highest accuracy was
observed in the Ensemble Stacking method, which combines Naïve Bayes, KNN,
and Random Forest, achieving an accuracy rate of 95.1%. |
---|