DESIGN AND IMPLEMENTING A MALWARE DETECTION SYSTEM USING NAÏVE BAYES VARIANTS

Along with the development of information technology, various threats in cyberspace are getting bigger, and these threats ignore time and conventional state boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the government sector, national critical infrastructure, and universit...

Full description

Saved in:
Bibliographic Details
Main Author: Cahyo Nugroho, Firdaus
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/80321
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Along with the development of information technology, various threats in cyberspace are getting bigger, and these threats ignore time and conventional state boundaries. Based on monitoring of 93 BSSN Honeynet Partners covering the government sector, national critical infrastructure, and universities throughout 2022, there were 818,192 malware attacks found. Meanwhile, based on statistical data released by the AV-TEST Institute, in the last 5 years, there has been a large increase in the number of malware infections; from January to August 2023, there were more than 1,068 million malware infections globally. This statistical data shows that the threat of malware cannot be underestimated, given its potential to compromise sensitive data, obliterate information, and disrupt operations on a substantial scale. Therefore, it is essential to keep conducting research on the detection and classification of malware to combat this threat. Malware detection is the process of identifying and classifying malicious software (malware) that can harm computer systems or other devices. Machine learning often serves as an effective solution for detecting malware. However, the implementation of machine learning in malware detection still presents many challenges. One crucial challenge is the selection of machine learning algorithms with high performance in detecting malicious software. Naïve Bayes is one of the machine learning algorithms that can be used for malware detection and classification. In Naïve Bayes, the term "naïve" refers to the assumption that all predictor variables (features) are independent of the values of other features in a specific class variable. By considering each feature as independent, the predictive performance of the Naïve Bayes classifier can be negatively affected by the presence of excessive attributes and dependencies in the training data. There are various ways to improve the performance of the Naïve Bayes classifier, such as removing correlated features using conditional independence, using Weighted Principal Component Analysis, combining feature weighting with Laplace calibration, using Chi-Square as a feature selection method with Laplace Smoothing, and combining the Naïve Bayes algorithm with other algorithms using ensemble bagging, voting, and stacking method. In this research, the author proposea detection system that employs different variations of Naïve Bayes algorithms, which have been improved specifically to iv classify malware. The experiment was conducted using the training dataset from the ember dataset and the testing dataset from Honeynet BSSN. The experimental results demonstrate an improvement in accuracy levels, with the base model Naïve Bayes algorithm achieving an accuracy of 50%, while the highest accuracy was observed in the Ensemble Stacking method, which combines Naïve Bayes, KNN, and Random Forest, achieving an accuracy rate of 95.1%.