A COMBINATION OF K-MEANS AND NAIVE BAYES ALGORITHMS TO IMPROVE THE ACCURACY AND SPEED OF MALWARE DETECTION TIME

Crowdstrike is an American company engaged in the field of information security explaining the fact that in 2023 there will be an increase in security holes which are the cause of e-crime (cybercrime). The ninth edition of the Global Threat report from cybersecurity leaders examines the evolving...

Full description

Saved in:
Bibliographic Details
Main Author: Shelviani, Hanasa
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/75313
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Crowdstrike is an American company engaged in the field of information security explaining the fact that in 2023 there will be an increase in security holes which are the cause of e-crime (cybercrime). The ninth edition of the Global Threat report from cybersecurity leaders examines the evolving behavior, cyber trends, and strategies of the countries most affected by cybercrime. Activities in the form of reports of 200+ e-crime threats found 33 new threats identified. In 2022, there are identity-based threats, exploitation of network systems from Chinese reconnaissance groups (china-nexus), as well as attacks on protected devices. In 2021, cyberattacks increased from 62% to 71% and activity of breaching confidentiality (from live keyboards) increased by 50% in early 2022. These data explain the increasing threat of cybercrime by humans to circumvent the protection of vulnerabilities and device robustness. The latest reports are created by CrowdStrike's world-leading information security intelligence team. The CrowdStrike intelligence team leverages millions of data from daily events that feed into the CrowdStrike Falcon platform and the deep knowledge of CrowdStrike Falcon OverWatch. Reports of cyber attacks in Indonesia from the end of 2022 to February 2023 with a total of 1,433 cyber incidents handled by the BSSN (National Cyber and Sandi Agency). The report contains the percentage that the incidence of 26% is data breach, 26% is web defacement, namely changing the appearance of the website, 24% is a ransomware attack, while the other 24% is a type of cyber attack that falls into another category. BSSN itself does not specify how many total cyber incidents or cyber threats they have successfully detected. The prediction for the development of malware activity in Indonesia in 2023 is that 26% of cyber attacks will take the form of malware and ransomware and the other 74% will be in the form of data breaches. How to deal with cyber attacks that are rife. Users can avoid security holes that are easy for malware to insert by means of positive habits, for example anticipating backing up data, identifying attacks, not clicking on dangerous links while surfing the internet or recognizing cybercrime modes that take advantage of victims. Furthermore, responses for 26% of attacks that occur due to malware. Denotes malware as the parent software used by various hackers. It takes analysis of how quickly malware can be detected. Through static analysis using malware datasets aggregated into datasets. This static analysis can provide results in the iv form of patterns that can be learned by algorithms. Machine learning algorithms continuously learn from a dataset that is built up. This pattern is formed from a collection of attributes that describe the extent to which the malware attacks. Attributes such as IP, port, source, flow duration, user and others can provide real information when malware last attacked by infecting the device. Of course each algorithm has advantages and disadvantages in detection. In this study the authors propose a new approach to combining two different machine learning algorithms, namely the labeled or unlabeled algorithm. Based on the two approaches, each of which has advantages. This study conducted experiments combining four times with five different datasets to test the accuracy and speed of time. The algorithms tested are random forest algorithm with k-means clustering, k-means clustering algorithm with random forest, naive Bayes algorithm with k-means clustering algorithm and k-means clustering algorithm with naive Bayes algorithm. The highest experimental results are in the combination of the k- means clustering algorithm and the Naive Bayes algorithm with an average of 96.12% and a time speed of 5.36 milliseconds.