A COMBINATION OF K-MEANS AND NAIVE BAYES ALGORITHMS TO IMPROVE THE ACCURACY AND SPEED OF MALWARE DETECTION TIME
Crowdstrike is an American company engaged in the field of information security explaining the fact that in 2023 there will be an increase in security holes which are the cause of e-crime (cybercrime). The ninth edition of the Global Threat report from cybersecurity leaders examines the evolving...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/75313 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Crowdstrike is an American company engaged in the field of information security
explaining the fact that in 2023 there will be an increase in security holes which
are the cause of e-crime (cybercrime). The ninth edition of the Global Threat report
from cybersecurity leaders examines the evolving behavior, cyber trends, and
strategies of the countries most affected by cybercrime. Activities in the form of
reports of 200+ e-crime threats found 33 new threats identified. In 2022, there are
identity-based threats, exploitation of network systems from Chinese
reconnaissance groups (china-nexus), as well as attacks on protected devices. In
2021, cyberattacks increased from 62% to 71% and activity of breaching
confidentiality (from live keyboards) increased by 50% in early 2022. These data
explain the increasing threat of cybercrime by humans to circumvent the protection
of vulnerabilities and device robustness. The latest reports are created by
CrowdStrike's world-leading information security intelligence team. The
CrowdStrike intelligence team leverages millions of data from daily events that feed
into the CrowdStrike Falcon platform and the deep knowledge of CrowdStrike
Falcon OverWatch.
Reports of cyber attacks in Indonesia from the end of 2022 to February 2023 with
a total of 1,433 cyber incidents handled by the BSSN (National Cyber and Sandi
Agency). The report contains the percentage that the incidence of 26% is data
breach, 26% is web defacement, namely changing the appearance of the website,
24% is a ransomware attack, while the other 24% is a type of cyber attack that falls
into another category. BSSN itself does not specify how many total cyber incidents
or cyber threats they have successfully detected. The prediction for the development
of malware activity in Indonesia in 2023 is that 26% of cyber attacks will take the
form of malware and ransomware and the other 74% will be in the form of data
breaches. How to deal with cyber attacks that are rife. Users can avoid security
holes that are easy for malware to insert by means of positive habits, for example
anticipating backing up data, identifying attacks, not clicking on dangerous links
while surfing the internet or recognizing cybercrime modes that take advantage of
victims. Furthermore, responses for 26% of attacks that occur due to malware.
Denotes malware as the parent software used by various hackers. It takes analysis
of how quickly malware can be detected. Through static analysis using malware
datasets aggregated into datasets. This static analysis can provide results in the
iv
form of patterns that can be learned by algorithms. Machine learning algorithms
continuously learn from a dataset that is built up. This pattern is formed from a
collection of attributes that describe the extent to which the malware attacks.
Attributes such as IP, port, source, flow duration, user and others can provide real
information when malware last attacked by infecting the device. Of course each
algorithm has advantages and disadvantages in detection.
In this study the authors propose a new approach to combining two different
machine learning algorithms, namely the labeled or unlabeled algorithm. Based on
the two approaches, each of which has advantages. This study conducted
experiments combining four times with five different datasets to test the accuracy
and speed of time. The algorithms tested are random forest algorithm with k-means
clustering, k-means clustering algorithm with random forest, naive Bayes algorithm
with k-means clustering algorithm and k-means clustering algorithm with naive
Bayes algorithm. The highest experimental results are in the combination of the k-
means clustering algorithm and the Naive Bayes algorithm with an average of
96.12% and a time speed of 5.36 milliseconds. |
---|