MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL
Malware has become a serious threat to the internet. According to antivirus company McAfee, an average of 588 malware attacks occurs every minute. The LockBit ransomware infected Indonesia National Data Center and caused the downtime of 282 Indonesian government institutions’ services for more th...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/87121 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:87121 |
---|---|
spelling |
id-itb.:871212025-01-13T13:37:20ZMALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL Juli Irzal Ismail, Setia Indonesia Dissertations malware detection, machine learning, self -supervised learning, image representation, multimodal. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87121 Malware has become a serious threat to the internet. According to antivirus company McAfee, an average of 588 malware attacks occurs every minute. The LockBit ransomware infected Indonesia National Data Center and caused the downtime of 282 Indonesian government institutions’ services for more than a week. Malware also caused economic losses. During 2023, it was reported that $1.1 trillion in ransomware payments were made. To detect malware, antivirus still relies on signature-based and heuristic-based detection techniques. This technique is effective for detecting malware. However, malware signatures and heuristic rules are compiled by malware analysts manually. It takes time and special skills to detect malware and create a signature. With the growing number of malwares, an automatic malware detection process is required. For this reason, machine learning technology is implemented for malware detection. With machine learning, the malware detection process is carried out automatically. However, the implementation of machine learning on malware detection still faces several problems. First the dataset labeling process takes a significant amount of time. Second, machine learning is not yet capable of detecting new malware. This highlights the need for a new malware detection method. In this study, a new malware detection method using machine learning is proposed to address these problems. The approach involves developing new detection techniques based on self -supervised learning methods and a multimodal architecture. Self -supervised learning techniques, which have been successfully applied in computer vision, achieve competitive results with supervised learning techniques but do not require an extensive labeling process. A novel malware detection method based on self-supervised has been developed, eliminating the need for a large labeling process. New malware was detected using multimodal methods. Malware files were converted into images, and their patterns were analyzed. Assuming that new malware reused code from known malware, the multimodal identified new malware by recognizing patterns from previously identified malware. The multimodal architecture combines two malware detection ii methods: one using image representations and the other using audio representations. The development of the proposed malware detection method is divided into three stages. The first stage involves developing a malware detection method using an image-based representation method with self -supervised learning (SSL). The second stage focuses on the development of a malware detection method utilizing audio representation with convolutional neural network (CNN). Finally, the third stage involves the development of multimodal architecture. The methods for all three stages are conducted using an experimental approach. The novelty of this research lies in the development of MalSSL, a malware detection method that does not require an extensive labeling process and multimodal approach to recognize new malware. MalSSL, the proposed method based on self-supervised learning and image representation, achieves malware classification accuracy of 98.4% without the need for labeling. The multimodal architecture, which combines image and audio representations using a late fusion approach, can detect new malware variants with an accuracy of 95.1%. Additionally, it achieves an accuracy of 99.7% in classifying known malware. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Malware has become a serious threat to the internet. According to antivirus
company McAfee, an average of 588 malware attacks occurs every minute. The
LockBit ransomware infected Indonesia National Data Center and caused the
downtime of 282 Indonesian government institutions’ services for more than a
week. Malware also caused economic losses. During 2023, it was reported that $1.1
trillion in ransomware payments were made.
To detect malware, antivirus still relies on signature-based and heuristic-based
detection techniques. This technique is effective for detecting malware. However,
malware signatures and heuristic rules are compiled by malware analysts manually.
It takes time and special skills to detect malware and create a signature. With the
growing number of malwares, an automatic malware detection process is required.
For this reason, machine learning technology is implemented for malware
detection. With machine learning, the malware detection process is carried out
automatically. However, the implementation of machine learning on malware
detection still faces several problems. First the dataset labeling process takes a
significant amount of time. Second, machine learning is not yet capable of detecting
new malware. This highlights the need for a new malware detection method.
In this study, a new malware detection method using machine learning is proposed
to address these problems. The approach involves developing new detection
techniques based on self -supervised learning methods and a multimodal
architecture. Self -supervised learning techniques, which have been successfully
applied in computer vision, achieve competitive results with supervised learning
techniques but do not require an extensive labeling process.
A novel malware detection method based on self-supervised has been developed,
eliminating the need for a large labeling process. New malware was detected using
multimodal methods. Malware files were converted into images, and their patterns
were analyzed. Assuming that new malware reused code from known malware, the
multimodal identified new malware by recognizing patterns from previously
identified malware. The multimodal architecture combines two malware detection
ii
methods: one using image representations and the other using audio
representations.
The development of the proposed malware detection method is divided into three
stages. The first stage involves developing a malware detection method using an
image-based representation method with self -supervised learning (SSL). The
second stage focuses on the development of a malware detection method utilizing
audio representation with convolutional neural network (CNN). Finally, the third
stage involves the development of multimodal architecture. The methods for all
three stages are conducted using an experimental approach.
The novelty of this research lies in the development of MalSSL, a malware
detection method that does not require an extensive labeling process and
multimodal approach to recognize new malware. MalSSL, the proposed method
based on self-supervised learning and image representation, achieves malware
classification accuracy of 98.4% without the need for labeling. The multimodal
architecture, which combines image and audio representations using a late fusion
approach, can detect new malware variants with an accuracy of 95.1%.
Additionally, it achieves an accuracy of 99.7% in classifying known malware. |
format |
Dissertations |
author |
Juli Irzal Ismail, Setia |
spellingShingle |
Juli Irzal Ismail, Setia MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL |
author_facet |
Juli Irzal Ismail, Setia |
author_sort |
Juli Irzal Ismail, Setia |
title |
MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL |
title_short |
MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL |
title_full |
MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL |
title_fullStr |
MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL |
title_full_unstemmed |
MALWARE DETECTION SYSTEM WITH SELFSUPERVISED LEARNING AND MULTIMODAL |
title_sort |
malware detection system with selfsupervised learning and multimodal |
url |
https://digilib.itb.ac.id/gdl/view/87121 |
_version_ |
1822011269993988096 |